Replies: 2 comments
-
|
Yup, @jart did some magic there 😄 , thanks 🙌 |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
performance of v0.8.7 seems to top all other version, wondering y? correct me if i were wrong |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I have been using llamafile for a few months now, and with the release of llama 3.1, I did some performance tests.
First, the evolution over time using llama 3.0 8b instruct on my old Xeon E5-1630 v3. I used the quantized models done by Sanctum AI https://huggingface.co/SanctumAI/Meta-Llama-3-8B-Instruct-GGUF .
All the tests were performed once on a mostly idle server, with the same prompt: the summarization of an article of about 2000 words (2350 tokens).
<style> </style>You can clearly see the performance jump from llamafile 0.8.1 to 0.8.2 (except on Q8_0), then another smaller one from 0.8.4 to 0.8.5.
And here is a comparison of llama 3.0 and llama 3.1 8b instruct GGUF with llamafile 0.8.1.
<style> </style>Thanks for all your work @jart and others.
Beta Was this translation helpful? Give feedback.
All reactions