metal lowbit kernels: optimized 2-bit, 3-bit and 4-bit shaders #1422

manuelcandales · 2024-12-16T21:39:09Z

Adapts optimized 4-bit shader from PyTorch (MLX inspired) and adds similarly optimized 2-bit shader.
Adds optimized 3-bit shader
Restricts N to be multiple of 4 and adjusts tests accordingly.

Performance (tokens/sec via torchchat):

Llama 3.2 1B (llama3.2-1b-base):
1bit: 28.0688
2bit: 31.2422
3bit: 30.1294
4bit: 30.7905
5bit: 28.1504
6bit: 28.4321
7bit: 27.3991

Llama 3.1 8B (llama3.1-base):
1bit: 7.4459
2bit: 15.6508
3bit: 15.3086
4bit: 16.1268
5bit: 6.7308
6bit: 6.4887
7bit: 6.4537

Notice that the performance of all n-bit kernels is similar for the 1B-parameter model. This optimization is felt when running the 8B-parameter model. Notice the jump from 6-7 tok/sec to 15-16 tok/sec when comparing the non-optimized kernels to the optimized 2-bit, 3-bit and 4-bit.

pytorch-bot · 2024-12-16T21:39:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1422

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8fda452 with merge base 603d908 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kimishpatel

When adding optimized version of something we should have some sort of benchmarking numbers. Ideally I would like that to come from standalone benchmark but fo rnow you can report what you got from torchchat

torchao/experimental/kernels/mps/metal.yaml

torchao/experimental/kernels/mps/metal/divbit.metal

torchao/experimental/kernels/mps/metal/int1mm.metal

torchao/experimental/kernels/mps/metal/int2mm_opt.metal

torchao/experimental/kernels/mps/metal/int3mm_opt.metal

torchao/experimental/kernels/mps/src/dispatch.h

torchao/experimental/kernels/mps/src/lowbit.h

torchao/experimental/kernels/mps/metal/int4mm_opt.metal

torchao/experimental/kernels/mps/test/test_lowbit.mm

kimishpatel

i trust the ones that are resolved as will be fixed, will get fixed. rest looks ok

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 16, 2024

manuelcandales requested a review from kimishpatel December 18, 2024 02:54

manuelcandales added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Dec 18, 2024

manuelcandales force-pushed the int4opt branch 2 times, most recently from 7226ac5 to 38df06e Compare December 18, 2024 16:42

manuelcandales changed the title ~~metal lowbit kernels: optimized 4-bit and 2-bit shaders~~ metal lowbit kernels: optimized 2-bit, 3-bit and 4-bit shaders Dec 18, 2024

manuelcandales force-pushed the int4opt branch from 38df06e to 5a94188 Compare December 18, 2024 16:52

kimishpatel requested changes Dec 18, 2024

View reviewed changes

kimishpatel approved these changes Dec 18, 2024

View reviewed changes

manuelcandales force-pushed the int4opt branch from 5a94188 to 84f9d73 Compare December 18, 2024 20:16

metal lowbit kernels: optimized 2-bit, 3-bit and 4-bit shaders

8fda452

manuelcandales force-pushed the int4opt branch from 84f9d73 to 8fda452 Compare December 18, 2024 20:33

manuelcandales merged commit 2e032c6 into main Dec 18, 2024
18 checks passed

manuelcandales mentioned this pull request Dec 18, 2024

update torchao pin: optimized mps lowbit shaders pytorch/torchchat#1428

Merged

amdfaa pushed a commit that referenced this pull request Jan 10, 2025

metal lowbit kernels: optimized 2-bit, 3-bit and 4-bit shaders (#1422)

db07bb8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metal lowbit kernels: optimized 2-bit, 3-bit and 4-bit shaders #1422

metal lowbit kernels: optimized 2-bit, 3-bit and 4-bit shaders #1422

Uh oh!

manuelcandales commented Dec 16, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 16, 2024 •

edited

Loading

Uh oh!

kimishpatel left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kimishpatel left a comment

Uh oh!

Uh oh!

Uh oh!

metal lowbit kernels: optimized 2-bit, 3-bit and 4-bit shaders #1422

metal lowbit kernels: optimized 2-bit, 3-bit and 4-bit shaders #1422

Uh oh!

Conversation

manuelcandales commented Dec 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1422

✅ No Failures

Uh oh!

kimishpatel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kimishpatel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

manuelcandales commented Dec 16, 2024 •

edited

Loading

pytorch-bot bot commented Dec 16, 2024 •

edited

Loading