Remove preserve_zero and zero_point_domain from choose_qparams_affine #2149

jainapurva · 2025-04-29T17:43:39Z

This pull request focuses on refactoring and simplifying quantization-related code by removing unused or redundant functionality and introducing specialized methods for handling specific cases. The most important changes include removing the preserve_zero and zero_point_domain parameters from many functions, introducing new specialized quantization and dequantization methods, and modifying use-cases accordingly.

Refactoring and Simplification:

Removed the preserve_zero and zero_point_domain parameters from choose_qparams_affine, quantize_affine, and dequantize_affine calls across multiple files, while introducing specialized methods to handle specific quantization scenarios.

The following table contains the new methods:

Original Method	ZeroPointDomain value	preserve_zero value	New Method
choose_qparams_affine	INT/NONE	True	choose_qparams_affine
choose_qparams_affine	FLOAT	False	choose_qparams_affine_tinygemm
choose_qparams_affine	INT	False	choose_qparams_affine_dont_preserve_zero
quantize_affine	INT	N/A	quantize_affine
quantize_affine	FLOAT	N/A	quantize_affine_float_zero_point
quantize_affine	NONE	N/A	quantize_affine_no_zero_point
dequantize_affine	INT	N/A	dequantize_affine
dequantize_affine	FLOAT	N/A	dequantize_affine_float_zero_point
dequantize_affine	NONE	N/A	dequantize_affine_no_zero_point

Notable updates related to the changes:

from_hp_tp_intx and from_hp_to_intx_static still take zero_point_domain and preserve_zero as inout, and call the respective choose_qparams/quantize/dequantize_affine functions.
from_hp_to_floatx and from_hp_to_floatx_static use the float8 methods: choose_qparams_affine_float8, quantize_affine_float8 and dequantize_affine_float8

The following list contains AOBaseConfigs, along with the corresponding choose_qparams_affine function calls made by the backend for each configuration:

AoBaseConfig	choose_qparams_affine
Int8DynamicActivationInt4WeightConfig	choose_qparams_affine
Int8DynamicActivationIntxWeightConfig	choose_qparams_affine / choose_qparams_affine_dont_preserve_zero
GemliteUIntXWeightOnlyConfig	choose_qparams_and_quantize_affine_hqq / choose_qparams_affine
Int4WeightOnlyConfig	choose_qparams_affine / choose_qparams_affine_tinygemm / choose_qparams_affine_dont_preserve_zero
Int8WeightOnlyConfig	choose_qparams_affine
Int8DynamicActivationInt8WeightConfig	choose_qparams_affine
Float8WeightOnlyConfig	choose_qparams_affine_float8
Float8DynamicActivationFloat8WeightConfig	choose_qparams_affine_float8
Float8StaticActivationFloat8WeightConfig	choose_qparams_affine_float8
UIntXWeightOnlyConfig	choose_qparams_and_quantize_affine_hqq / choose_qparams_affine
IntxWeightOnlyConfig	choose_qparams_affine / choose_qparams_affine_dont_preserve_zero
FPXWeightOnlyConfig	choose_qparams_affine_fpx

pytorch-bot · 2025-04-29T17:43:42Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2149

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 New Failures

As of commit 214e704 with merge base 212d912 ():

NEW FAILURES - The following jobs have failed:

Run Float8 Tests / test (SM-89, linux.g6.4xlarge.experimental.nvidia.gpu, --pre torch --index-url https://download.p... / linux-job (gh)
RuntimeError: Command docker exec -t 7469a270ec2d8a7bfcb453bd3a93765ef738644729c53e70d9237c9c0038ee8f /exec failed with exit code 139
Run Regression Tests / test (CPU 2.6, linux.4xlarge, torch==2.6.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t d3bacdbd18d5103296fc71698484f84c56e81515a745c819dcd550c294ae121c /exec failed with exit code 139
Run Regression Tests / test (CPU 2.7, linux.4xlarge, torch==2.7.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t 1a5a29489b0687539c844b1af5d1fb13ac95b084cf2b906b073ca8bd86038e04 /exec failed with exit code 139
Run Regression Tests / test (CUDA 2.6, linux.g5.12xlarge.nvidia.gpu, torch==2.6.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t c9f1638ea38252eceda55c691941367c1db8d8f742070fd39dc098b6f4088433 /exec failed with exit code 139
Run Regression Tests / test (CUDA 2.7, linux.g5.12xlarge.nvidia.gpu, torch==2.7.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t e69234a2fcd017b972244cf167ca7b0ce41dea679e4119863e882d59c2d3c394 /exec failed with exit code 139
Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh)
RuntimeError: Command docker exec -t 4ad6acbcb0d0c0ce1a91a466db95b9366b072a00045fa548c9e92f271663d43a /exec failed with exit code 139
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
RuntimeError: Command docker exec -t 6cb6859bde4968bea1b47120c192a079426d1db65a53145f6b9ac182cfaf1ed7 /exec failed with exit code 139

This comment was automatically generated by Dr. CI and updates every 15 minutes.

test/quantization/test_quant_primitives.py

torchao/quantization/observer.py

torchao/quantization/quant_primitives.py

test/dtypes/test_uintx.py

jerryzh168 · 2025-05-16T00:10:39Z

torchao/dtypes/uintx/int4_cpu_layout.py

@@ -255,7 +254,7 @@ def get_plain(self) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
        target_dtype = torch.int32
        quant_min = 0
        quant_max = 15
-        zero_point_domain = ZeroPointDomain.FLOAT
+        # zero_point_domain is ZeroPointDomain.FLOAT


nit: remove

I thought to keep it for now, as it can be an indicator of previous implementation.

torchao/prototype/parq/quant/uniform_torchao.py

jerryzh168 · 2025-05-16T00:12:04Z

torchao/prototype/quantization/autoquant_v2.py

@@ -1025,7 +1025,6 @@ def get_per_token_block_size(x):
            block_size=block_size,
            target_dtype=target_dtype,
            _layout=_layout,
-            scale_dtype=torch.float32,


should this be reverted?

jerryzh168

looks good, thanks @jainapurva for carefully working through this!

jerryzh168 · 2025-05-16T00:14:43Z

torchao/quantization/quant_primitives.py

+        zero_point_domain is optional specifies how we quantize the floating point to quantized data:
+        INT: quantized_val = (float_val / scale) (integer) + zero_point (integer)
+        FLOAT: quantized_val = (float_val - (zero_point (float) - scale * mid_point)) / scale
+        None: quantized_val = (float_val / scale) | this is primarily used for floatx quantization
+            Where we do not want to round values to nearest integer and instead scale and cast.


nit: we can just leave the one that is relevant

torchao/quantization/quant_primitives.py

jerryzh168 · 2025-05-16T00:15:10Z

torchao/quantization/quant_primitives.py

-        raise ValueError("Please use ZeroPointDomain.NONE instead of None")
-    elif zero_point_domain is ZeroPointDomain.NONE and zero_point is not None:
-        raise ValueError("zero_point should be None when zero_point_domain is NONE")
+    # if zero_point_domain is None:


nit: please remove the commented code before landing

jerryzh168 · 2025-05-16T00:15:33Z

torchao/quantization/quant_primitives.py

+    quant_max: Union[int, float],
+    output_dtype: torch.dtype = torch.float32,
+) -> torch.Tensor:
+    """This function converts AQT tensors to their high precision floating point representation


should we only have doc for non-private helper functions?

torchao/quantization/quant_primitives.py

jerryzh168

I think the docs has to be updated a bit, commented inline

jainapurva added 2 commits April 28, 2025 13:05

Split choose_qparams_affine

b133369

Remove preserve_zero and zero_point_domain

8e4bca8

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 29, 2025

jainapurva added 4 commits April 29, 2025 12:26

Update choose_qparams_affine_min_max

a68a679

Update float8 choose_qparams

b9c7c53

Use float8 choose/quantize/dequantize

ea5525e

Updates to choose_qparams_affine uses

cff885b

jainapurva added topic: not user facing Use this tag if you don't want this PR to show up in release notes topic: for developers Use this tag if this PR is mainly developer facing labels Apr 29, 2025

jerryzh168 reviewed Apr 30, 2025

View reviewed changes

test/quantization/test_quant_primitives.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Apr 30, 2025

View reviewed changes

torchao/quantization/observer.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Apr 30, 2025

View reviewed changes

torchao/quantization/quant_primitives.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Apr 30, 2025

View reviewed changes

torchao/quantization/quant_primitives.py Show resolved Hide resolved

jerryzh168 reviewed Apr 30, 2025

View reviewed changes

torchao/quantization/quant_primitives.py Show resolved Hide resolved

Test fixes

f747fff

jerryzh168 reviewed Apr 30, 2025

View reviewed changes

torchao/quantization/quant_primitives.py Show resolved Hide resolved

jainapurva added 3 commits April 29, 2025 21:27

Updates

694dab3

Updates

62a99a1

Split quantize_affine based on zero_point_domain

3a5efa7

jainapurva marked this pull request as ready for review April 30, 2025 17:36

Merge remote-tracking branch 'origin/main' into qparam_args

57d55b0

jainapurva marked this pull request as draft April 30, 2025 18:10

jainapurva added 2 commits April 30, 2025 15:00

Fix tests

6e42999

dequantize_affine and test fixes

414df66

jerryzh168 reviewed May 1, 2025

View reviewed changes

torchao/quantization/quant_primitives.py Show resolved Hide resolved

jainapurva added 4 commits May 5, 2025 11:17

Test fixes

e3f307d

Ignore quantize_pt2e until fixed

2f1ded8

Fix pt2e

f67a1f3

Ruff fixes

d2b47c4

Fix qat issues

9780257

jainapurva force-pushed the qparam_args branch 2 times, most recently from 85936a5 to 9780257 Compare May 13, 2025 21:52

jainapurva added 5 commits May 13, 2025 14:52

Merge branch 'main' into qparam_args

9462d0a

Updates

a978360

Updates

fae98f1

Updates

d9b2339

Updates

cb760bf

jainapurva marked this pull request as ready for review May 14, 2025 05:20

Minor fixes

6c90140

jerryzh168 reviewed May 16, 2025

View reviewed changes

test/dtypes/test_uintx.py Show resolved Hide resolved

jerryzh168 reviewed May 16, 2025

View reviewed changes

torchao/prototype/parq/quant/uniform_torchao.py Show resolved Hide resolved

jerryzh168 reviewed May 16, 2025

View reviewed changes

jerryzh168 approved these changes May 16, 2025

View reviewed changes

jerryzh168 reviewed May 16, 2025

View reviewed changes

torchao/quantization/quant_primitives.py Show resolved Hide resolved

jerryzh168 reviewed May 16, 2025

View reviewed changes

torchao/quantization/quant_primitives.py Show resolved Hide resolved

jerryzh168 reviewed May 16, 2025

View reviewed changes

torchao/quantization/quant_primitives.py Outdated Show resolved Hide resolved

jerryzh168 approved these changes May 16, 2025

View reviewed changes

jerryzh168 added the topic: bc-breaking Use this tag if this PR breaks backward compatibility label May 16, 2025

jainapurva added 5 commits May 21, 2025 11:24

Update reviews

5ce7e50

Updates

17a3674

Merge remote-tracking branch 'origin/main' into qparam_args

0e9bb83

Updates

e4183a5

Updates

214e704

jainapurva merged commit 04fb450 into main May 21, 2025
12 of 19 checks passed

jerryzh168 mentioned this pull request May 29, 2025

Enable AWQ on Intel GPU. #2248

Closed

Remove preserve_zero and zero_point_domain from choose_qparams_affine #2149

Remove preserve_zero and zero_point_domain from choose_qparams_affine #2149

Uh oh!

Conversation

jainapurva commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refactoring and Simplification:

Uh oh!

pytorch-bot bot commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2149

❌ 7 New Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jerryzh168 May 16, 2025

Choose a reason for hiding this comment

Uh oh!

jainapurva May 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jerryzh168 May 16, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

jerryzh168 May 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jerryzh168 May 16, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 May 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jainapurva commented Apr 29, 2025 •

edited

Loading

pytorch-bot bot commented Apr 29, 2025 •

edited

Loading