Support loading for static quant weight fp8 act fp8 #730

yiliu30 · 2025-08-12T07:06:07Z

Add a new backend auto_round:torch_fp8_static for loading and inference w8afp8
Enhance the QuantizationScheme to support dict-style access.
Wrap layer_config dict as QuantizationScheme, and propagate it to the backend check

Signed-off-by: yiliu30 <[email protected]>

Copilot

Pull Request Overview

This PR adds support for loading static quantized models with FP8 weights and FP8 activations by implementing a new quantized linear layer class and updating the model conversion infrastructure.

Key changes:

Implemented WeightFP8ActFP8StaticQuantLinear class for handling FP8 weight and activation quantization
Updated model conversion logic to detect and handle FP8 static quantization configurations
Enhanced test coverage to verify both export and loading functionality for static FP8 quantization

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
test/test_cpu/test_export.py	Extended test to verify loading of static FP8 quantized models and renamed test method
auto_round/inference/convert_model.py	Added support for `act_dynamic` parameter and FP8 static quantization detection in model conversion
auto_round/inference/backend.py	Added FP8 static quantization detection function and updated dynamic import logic
auto_round/export/export_to_autoround/export_to_fp8_woq.py	Implemented new `WeightFP8ActFP8StaticQuantLinear` class with quantization/dequantization methods

auto_round/export/export_to_autoround/export_to_fp8_woq.py

wenhuach21 · 2025-08-12T07:26:49Z

This PR is unnecessary for now, you need to work with Heng to fix the FP8 --format fake bug and the evaluation should be fine

Co-authored-by: Copilot <[email protected]>

yiliu30 · 2025-08-13T00:42:58Z

This PR is unnecessary for now, you need to work with Heng to fix the FP8 --format fake bug and the evaluation should be fine

@wenhuach21 The purpose of this PR is to support loading an existing qmodel from disk and then evaluating its accuracy.
To my understanding, this was not supported by --format fake.

cc @n1ck-guo

wenhuach21 · 2025-08-13T01:24:32Z

This PR is unnecessary for now, you need to work with Heng to fix the FP8 --format fake bug and the evaluation should be fine

@wenhuach21 The purpose of this PR is to support loading an existing qmodel from disk and then evaluating its accuracy. To my understanding, this was not supported by --format fake.

cc @n1ck-guo

Yes, but the primary purpose is for evaluation, which the fake model should cover well #731. This is not a product feature, and it involves changes to critical product code. As discussed earlier, please hold this PR for now, or move the code elsewhere without modifying the important HF model inference code.

Signed-off-by: yiliu30 <[email protected]>

auto_round/inference/backend.py

Signed-off-by: yiliu30 <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: yiliu30 <[email protected]>

auto_round/autoround.py

auto_round/utils.py

Signed-off-by: yiliu30 <[email protected]>

auto_round/export/export_to_autoround/export.py

auto_round/inference/backend.py

wenhuach21 · 2025-09-10T01:15:04Z

auto_round/autoround.py

@@ -857,8 +859,8 @@ def remove_duplicates(lst):
                        format = "auto_round:auto_awq"
                elif is_nv_fp(self.data_type) or is_mx_fp(self.data_type):
                    format = f"auto_round:{self.data_type}"
-                elif is_wfp8afp8(self):  # staic wfp8afp8
-                    format = "auto_round:fp8"
+                elif is_static_wfp8afp8(self):  # staic wfp8afp8


@WeiweiZhang1 you have an AR to refine formats related code, please be aware of this change

wenhuach21 · 2025-09-10T01:18:48Z

Additionally, please make sure all ut in https://github.com/intel/auto-round/blob/main/test/test_cuda/test_transformers.py could pass before merging,

Signed-off-by: yiliu30 <[email protected]>

…wfp8-afp8

yiliu30 · 2025-09-10T01:51:09Z

Additionally, please make sure all ut in https://github.com/intel/auto-round/blob/main/test/test_cuda/test_transformers.py could pass before merging,

The local tests passed.

=============== short test summary info ===============
PASSED test_transformers.py::AutoRoundTest::test_convert_from_gptq
PASSED test_transformers.py::AutoRoundTest::test_mixed_bits
PASSED test_transformers.py::AutoRoundTest::test_quantized_model
PASSED test_transformers.py::AutoRoundTest::test_quantized_model_bf16
PASSED test_transformers.py::AutoRoundTest::test_quantized_model_multi_gpu
PASSED test_transformers.py::AutoRoundTest::test_raise_if_non_quantized
PASSED test_transformers.py::AutoRoundTest::test_save_pretrained
SKIPPED [1] test_transformers.py:166: test requires Intel Extension for PyTorch to be installed and match current PyTorch version, see https://github.com/intel/intel-extension-for-pytorch
SKIPPED [1] test_transformers.py:101: test requires Intel Extension for PyTorch to be installed and match current PyTorch version, see https://github.com/intel/intel-extension-for-pytorch
= 7 passed, 2 skipped, 38 warnings in 102.73s (0:01:42) =

yiliu30 added 5 commits August 11, 2025 10:19

load w8a8

bb94782

Signed-off-by: yiliu30 <[email protected]>

refactor

9bef826

Signed-off-by: yiliu30 <[email protected]>

add ut

b30a126

Signed-off-by: yiliu30 <[email protected]>

remove example

eaad3a6

Signed-off-by: yiliu30 <[email protected]>

fix typo

c411ca5

Signed-off-by: yiliu30 <[email protected]>

yiliu30 requested a review from Copilot August 12, 2025 07:06

Copilot AI reviewed Aug 12, 2025

View reviewed changes

auto_round/export/export_to_autoround/export_to_fp8_woq.py Outdated Show resolved Hide resolved

auto_round/export/export_to_autoround/export_to_fp8_woq.py Outdated Show resolved Hide resolved

auto_round/export/export_to_autoround/export_to_fp8_woq.py Outdated Show resolved Hide resolved

Merge branch 'main' into wfp8-afp8

9802313

yiliu30 and others added 2 commits August 13, 2025 08:39

Update auto_round/export/export_to_autoround/export_to_fp8_woq.py

6597d5c

Co-authored-by: Copilot <[email protected]>

Update export_to_fp8_woq.py

9b0f32f

Merge branch 'main' into wfp8-afp8

c32daa6

yiliu30 marked this pull request as ready for review August 13, 2025 00:43

yiliu30 added 2 commits August 24, 2025 03:41

megre main

c136339

Signed-off-by: yiliu30 <[email protected]>

update shape

5ebca24

Signed-off-by: yiliu30 <[email protected]>

wenhuach21 reviewed Aug 25, 2025

View reviewed changes

auto_round/inference/backend.py Outdated Show resolved Hide resolved

yiliu30 added 5 commits August 26, 2025 02:54

refactor

03cb217

Signed-off-by: yiliu30 <[email protected]>

Merge branch 'main' into wfp8-afp8

e7280f6

tmp add bk

66388e5

Signed-off-by: yiliu30 <[email protected]>

refactor code

17ddd2d

Signed-off-by: yiliu30 <[email protected]>

refine code

808449d

Signed-off-by: yiliu30 <[email protected]>

yiliu30 changed the title ~~Support loading for static quant weight fp8 act fp8~~ [WIP]Support loading for static quant weight fp8 act fp8 Aug 27, 2025

yiliu30 added 6 commits August 26, 2025 23:17

fix device list

f74ed6f

Signed-off-by: yiliu30 <[email protected]>

fix

632cf8a

Signed-off-by: yiliu30 <[email protected]>

refactor code

5b8b29d

Signed-off-by: yiliu30 <[email protected]>

fix

57b4c19

Signed-off-by: yiliu30 <[email protected]>

update

bdf5f3e

Signed-off-by: yiliu30 <[email protected]>

fix ut

ce3384f

Signed-off-by: yiliu30 <[email protected]>

yiliu30 and others added 13 commits September 4, 2025 22:27

add warning

f4e254b

Signed-off-by: yiliu30 <[email protected]>

Merge branch 'main' into wfp8-afp8

7cba242

rename check

ff5a1e9

Signed-off-by: yiliu30 <[email protected]>

Merge branch 'main' into wfp8-afp8

b98f3db

rename

50968fd

Signed-off-by: yiliu30 <[email protected]>

Merge branch 'main' into wfp8-afp8

586d6a2

Merge branch 'main' into wfp8-afp8

5e84ff9

[pre-commit.ci] auto fixes from pre-commit.com hooks

abd83ac

for more information, see https://pre-commit.ci

Merge branch 'main' into wfp8-afp8

9e2c63f

[pre-commit.ci] auto fixes from pre-commit.com hooks

d332a95

for more information, see https://pre-commit.ci

Merge branch 'main' into wfp8-afp8

94508e3

[pre-commit.ci] auto fixes from pre-commit.com hooks

8a4a533

for more information, see https://pre-commit.ci

fix

f05e38b

Signed-off-by: yiliu30 <[email protected]>

wenhuach21 reviewed Sep 9, 2025

View reviewed changes

auto_round/autoround.py Outdated Show resolved Hide resolved

wenhuach21 reviewed Sep 9, 2025

View reviewed changes

auto_round/utils.py Outdated Show resolved Hide resolved

yiliu30 added 2 commits September 9, 2025 07:18

Merge branch 'main' into wfp8-afp8

d759ca3

update

c58a61c

Signed-off-by: yiliu30 <[email protected]>

yiliu30 requested a review from wenhuach21 September 9, 2025 11:36

yiliu30 added 2 commits September 9, 2025 22:05

Merge branch 'main' into wfp8-afp8

c89ffc0

Merge branch 'main' into wfp8-afp8

04ae0fd

wenhuach21 reviewed Sep 10, 2025

View reviewed changes

auto_round/export/export_to_autoround/export.py Outdated Show resolved Hide resolved

wenhuach21 reviewed Sep 10, 2025

View reviewed changes

auto_round/inference/backend.py Outdated Show resolved Hide resolved

wenhuach21 reviewed Sep 10, 2025

View reviewed changes

wenhuach21 approved these changes Sep 10, 2025

View reviewed changes

yiliu30 added 2 commits September 9, 2025 21:34

fix

2c34244

Signed-off-by: yiliu30 <[email protected]>

Merge branch 'wfp8-afp8' of https://github.com/intel/auto-round into …

b3a0910

…wfp8-afp8

yiliu30 merged commit 09e4d31 into main Sep 10, 2025
8 checks passed

yiliu30 deleted the wfp8-afp8 branch September 10, 2025 02:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support loading for static quant weight fp8 act fp8 #730

Support loading for static quant weight fp8 act fp8 #730

Uh oh!

yiliu30 commented Aug 12, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Aug 12, 2025

Uh oh!

yiliu30 commented Aug 13, 2025

Uh oh!

wenhuach21 commented Aug 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 Sep 10, 2025

Uh oh!

wenhuach21 commented Sep 10, 2025

Uh oh!

yiliu30 commented Sep 10, 2025

Uh oh!

Uh oh!

Uh oh!

Support loading for static quant weight fp8 act fp8 #730

Support loading for static quant weight fp8 act fp8 #730

Uh oh!

Conversation

yiliu30 commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 commented Aug 12, 2025

Uh oh!

yiliu30 commented Aug 13, 2025

Uh oh!

wenhuach21 commented Aug 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenhuach21 Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

wenhuach21 commented Sep 10, 2025

Uh oh!

yiliu30 commented Sep 10, 2025

Uh oh!

Uh oh!

Uh oh!

yiliu30 commented Aug 12, 2025 •

edited

Loading