-
Notifications
You must be signed in to change notification settings - Fork 317
Expose hqq through uintx_weight_only
API
#786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -227,7 +227,12 @@ This technique works best when the torch._inductor.config.use_mixed_mm option is | |
```python | ||
# for torch 2.4+ | ||
from torchao.quantization import quantize_, int4_weight_only | ||
quantize_(model, int4_weight_only()) | ||
group_size = 32 | ||
|
||
# you can enable [hqq](https://github.com/mobiusml/hqq/tree/master) quantization which is expected to improves accuracy through | ||
# use_hqq flag for `int4_weight_only` quantization | ||
use_hqq = False | ||
quantize_(model, int4_weight_only(group_size=group_size, use_hqq=use_hqq)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since this is user facing, should the current bool There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah this is not ideal, I'm planning to just have a separate hqq config and remove the flag There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @vkuzo I'm integrating hqq into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if you're ok with potentially changing it later, sgtm |
||
|
||
# for torch 2.2.2 and 2.3 | ||
from torchao.quantization.quant_api import change_linear_weights_to_int4_woqtensors | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this different from the way we enable auto-round? Which is its own function like
apply_auto_round
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this depends on whether we want to just expose int4 weight only quant or all bitwidths. this PR just enables hqq for int4 so it's more convenient to just add this to existing int4_weight_only quant. but if we want to support all bitwidth, then we should follow what auto_round is doing.
cc @mobicham please let me know which one makes more sense
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can keep that flag in
int4_weight_only
and have some call like this for the more general intx case ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My sense is we should separate out the implementation details from the algorithm name. Internally HQQ can be implemented by calling int4_weight_only but no reason to leak this detail to end users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mobicham sure, that would align with what auto_round is doing now I think
@msaroufim you are also suggesting to have a separate
hqq_weight_only(dtype, group_size, layout_type)
method right?