Skip to content

Conversation

@arthw
Copy link
Contributor

@arthw arthw commented May 12, 2024

This PR is used to revert the workaround solution in #5895.
That was a workaround to fix a known issue of oneMKL in Intel MTL Arc GPU.
Now, looks like the new oneMKL (oneAPI base toolkit 2024.1) is fixed the issue.
So, revert the old solution.

Now, we get the +32% in Intel MTL Arc GPU and +21% in Arc 770, tested with llama2-7b-Q4.

Next token:

MTL
7.06 tokens per second -> 9.37 tokens per second

Arc770
25.14 tokens per second ->30.50 tokens per second

@arthw arthw requested a review from airMeng May 12, 2024 03:22
@mofosyne mofosyne added Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level ggml changes relating to the ggml tensor library for machine learning Review Complexity : High Generally require indepth knowledge of LLMs or GPUs and removed Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level labels May 12, 2024
@airMeng
Copy link
Contributor

airMeng commented May 13, 2024

can you paste the absolute performance number here?

@airMeng airMeng merged commit 948f4ec into ggml-org:master May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Review Complexity : High Generally require indepth knowledge of LLMs or GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants