Skip to content

Conversation

xin3he
Copy link
Contributor

@xin3he xin3he commented Aug 24, 2025

layer_config, avg_bits = autoround._generate_recipe(
        # same data type config as before
        mp_dtype={
            "data_type": "mx_fp8",
            "act_data_type": "mx_fp8",
        },
        # special mix-precision configuration
        mp_config={
            "mp_ratio": 1/3,
            "loss_weight": 2.0,
            "numel_weight": 1.0,
        },
)
autoround.layer_config = layer_config
autoround.quantize()

This code is used for INC accuracy tuning, currently only mx_fp8 is supported for mixing with mx_fp4 and nv_fp4.
image

@xin3he xin3he force-pushed the xinhe/mix-precision branch from 027b2c2 to d8b831e Compare August 24, 2025 07:32
@xin3he xin3he requested a review from WeiweiZhang1 August 24, 2025 07:32
@xin3he xin3he requested a review from wenhuach21 August 24, 2025 07:37
@xin3he xin3he force-pushed the xinhe/mix-precision branch from 419b63c to 79323d6 Compare August 25, 2025 03:39
@wenhuach21
Copy link
Contributor

Have you tested it on an MoE model? It might require some special handling.

@xin3he xin3he requested review from wenhuach21 and n1ck-guo August 25, 2025 07:28
combination_list = []
numel_list = []
loss_list = []
for hp_layers in combinations(quantizable_layers, quantizable_num):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about creating a new file in the experimental folder and moving these there?

@@ -2616,6 +2641,29 @@ def quantize_blocks(
clear_memory()
input_ids = to_device(input_ids, self.cache_device)
input_others = to_device(input_others, self.cache_device)
if self.recipe_mode:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to wrap this new code into a function and call it as early as possible.

@@ -2954,6 +3002,21 @@ def sampling_inputs(cls, input_ids, input_others, indices, seqlen, batch_dim=0,

return current_input_ids, current_input_others

def _dump_average_bits(self, layer_config=None):
Copy link
Contributor

@wenhuach21 wenhuach21 Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function cannot be used by AutoRound, since layers are converted to QuantizedLinear after quantization. If the function can correctly dump average bits in typical scenarios such as INT4, I’d prefer to keep it in the class. Otherwise, it would be better to move it elsewhere for now.

Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
@xin3he xin3he force-pushed the xinhe/mix-precision branch from dd79696 to 819fa22 Compare September 3, 2025 01:47
@xin3he xin3he force-pushed the xinhe/mix-precision branch from 819fa22 to 92025ad Compare September 3, 2025 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants