-
Notifications
You must be signed in to change notification settings - Fork 52
add autoround._generate_recipe() #758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: xinhe3 <[email protected]>
027b2c2
to
d8b831e
Compare
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
419b63c
to
79323d6
Compare
Have you tested it on an MoE model? It might require some special handling. |
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
combination_list = [] | ||
numel_list = [] | ||
loss_list = [] | ||
for hp_layers in combinations(quantizable_layers, quantizable_num): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about creating a new file in the experimental folder and moving these there?
@@ -2616,6 +2641,29 @@ def quantize_blocks( | |||
clear_memory() | |||
input_ids = to_device(input_ids, self.cache_device) | |||
input_others = to_device(input_others, self.cache_device) | |||
if self.recipe_mode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to wrap this new code into a function and call it as early as possible.
auto_round/autoround.py
Outdated
@@ -2954,6 +3002,21 @@ def sampling_inputs(cls, input_ids, input_others, indices, seqlen, batch_dim=0, | |||
|
|||
return current_input_ids, current_input_others | |||
|
|||
def _dump_average_bits(self, layer_config=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function cannot be used by AutoRound, since layers are converted to QuantizedLinear after quantization. If the function can correctly dump average bits in typical scenarios such as INT4, I’d prefer to keep it in the class. Otherwise, it would be better to move it elsewhere for now.
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
dd79696
to
819fa22
Compare
Signed-off-by: xinhe3 <[email protected]>
819fa22
to
92025ad
Compare
This code is used for INC accuracy tuning, currently only

mx_fp8
is supported for mixing withmx_fp4
andnv_fp4
.