add autoround._generate_recipe() #758

xin3he · 2025-08-24T07:25:44Z

layer_config, avg_bits = autoround._generate_recipe(
        # same data type config as before
        mp_dtype={
            "data_type": "mx_fp8",
            "act_data_type": "mx_fp8",
        },
        # special mix-precision configuration
        mp_config={
            "mp_ratio": 1/3,
            "loss_weight": 2.0,
            "numel_weight": 1.0,
        },
)
autoround.layer_config = layer_config
autoround.quantize()

This code is used for INC accuracy tuning, currently only mx_fp8 is supported for mixing with mx_fp4 and nv_fp4.

Signed-off-by: xinhe3 <[email protected]>

auto_round/autoround.py

auto_round/data_type/mxfp.py

auto_round/data_type/nvfp.py

Signed-off-by: xinhe3 <[email protected]>

auto_round/autoround.py

auto_round/utils.py

auto_round/autoround.py

Signed-off-by: xinhe3 <[email protected]>

wenhuach21 · 2025-08-25T03:47:30Z

Have you tested it on an MoE model? It might require some special handling.

Signed-off-by: xinhe3 <[email protected]>

wenhuach21 · 2025-08-25T07:36:09Z

auto_round/utils.py

+    combination_list = []
+    numel_list = []
+    loss_list = []
+    for hp_layers in combinations(quantizable_layers, quantizable_num):


How about creating a new file in the experimental folder and moving these there?

wenhuach21 · 2025-08-25T07:38:01Z

auto_round/autoround.py

@@ -2616,6 +2641,29 @@ def quantize_blocks(
        clear_memory()
        input_ids = to_device(input_ids, self.cache_device)
        input_others = to_device(input_others, self.cache_device)
+        if self.recipe_mode:


It would be better to wrap this new code into a function and call it as early as possible.

wenhuach21 · 2025-08-25T07:43:41Z

auto_round/autoround.py

@@ -2954,6 +3002,21 @@ def sampling_inputs(cls, input_ids, input_others, indices, seqlen, batch_dim=0,

        return current_input_ids, current_input_others

+    def _dump_average_bits(self, layer_config=None):


This function cannot be used by AutoRound, since layers are converted to QuantizedLinear after quantization. If the function can correctly dump average bits in typical scenarios such as INT4, I’d prefer to keep it in the class. Otherwise, it would be better to move it elsewhere for now.

Signed-off-by: xinhe3 <[email protected]>

add autoround._generate_recipe()

d8b831e

Signed-off-by: xinhe3 <[email protected]>

xin3he force-pushed the xinhe/mix-precision branch from 027b2c2 to d8b831e Compare August 24, 2025 07:32

xin3he requested a review from WeiweiZhang1 August 24, 2025 07:32

xin3he commented Aug 24, 2025

View reviewed changes

auto_round/autoround.py Show resolved Hide resolved

auto_round/data_type/mxfp.py Show resolved Hide resolved

auto_round/data_type/nvfp.py Show resolved Hide resolved

remove duplicate code

a78f99c

Signed-off-by: xinhe3 <[email protected]>

xin3he requested a review from wenhuach21 August 24, 2025 07:37

pop lm_head in recipe dict

62f81e5

Signed-off-by: xinhe3 <[email protected]>

wenhuach21 reviewed Aug 25, 2025

View reviewed changes

auto_round/autoround.py Outdated Show resolved Hide resolved

wenhuach21 reviewed Aug 25, 2025

View reviewed changes

auto_round/utils.py Show resolved Hide resolved

wenhuach21 reviewed Aug 25, 2025

View reviewed changes

auto_round/autoround.py Outdated Show resolved Hide resolved

add more info in code and ut

79323d6

Signed-off-by: xinhe3 <[email protected]>

xin3he force-pushed the xinhe/mix-precision branch from 419b63c to 79323d6 Compare August 25, 2025 03:39

xinhe3 added 2 commits August 25, 2025 10:16

refactor code per review comments

086eae2

Signed-off-by: xinhe3 <[email protected]>

resolve circular import

b67c79a

Signed-off-by: xinhe3 <[email protected]>

xin3he requested review from wenhuach21 and n1ck-guo August 25, 2025 07:28

wenhuach21 reviewed Aug 25, 2025

View reviewed changes

xinhe3 added 4 commits September 1, 2025 08:10

fix bug

715e2a1

Signed-off-by: xinhe3 <[email protected]>

add workspace for testing

5a631f1

Signed-off-by: xinhe3 <[email protected]>

stop compile

2a0e0b8

Signed-off-by: xinhe3 <[email protected]>

use loss only

f02331d

Signed-off-by: xinhe3 <[email protected]>

xin3he force-pushed the xinhe/mix-precision branch from dd79696 to 819fa22 Compare September 3, 2025 01:47

add mxfp4_loss / mxfp8_loss to help

92025ad

Signed-off-by: xinhe3 <[email protected]>

xin3he force-pushed the xinhe/mix-precision branch from 819fa22 to 92025ad Compare September 3, 2025 05:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add autoround._generate_recipe() #758

add autoround._generate_recipe() #758

Uh oh!

xin3he commented Aug 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Aug 25, 2025

Uh oh!

wenhuach21 Aug 25, 2025

Uh oh!

wenhuach21 Aug 25, 2025

Uh oh!

wenhuach21 Aug 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

		@@ -2954,6 +3002,21 @@ def sampling_inputs(cls, input_ids, input_others, indices, seqlen, batch_dim=0,

		return current_input_ids, current_input_others

		def _dump_average_bits(self, layer_config=None):

add autoround._generate_recipe() #758

Are you sure you want to change the base?

add autoround._generate_recipe() #758

Uh oh!

Conversation

xin3he commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Aug 25, 2025

Uh oh!

wenhuach21 Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xin3he commented Aug 24, 2025 •

edited

Loading

wenhuach21 Aug 25, 2025 •

edited

Loading