Adds Q/DQ layout support for embedding quantization with IntxWeightOnlyConfig #1972

metascroy · 2025-03-27T00:58:56Z

This will be used to quantize embeddings in ET.

pytorch-bot · 2025-03-27T00:58:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1972

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 186f903 with merge base 5ded23c ():

NEW FAILURE - The following job has failed:

PR Label Check / Check PR Labels (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchao/experimental/quant_api.py

metascroy · 2025-04-07T16:06:30Z

torchao/quantization/quant_api.py

@@ -1569,6 +1572,92 @@ def _uintx_weight_only_transform(
    return module


+@dataclass
+class IntxWeightOnlyConfig(AOBaseConfig):
+    """


@andrewor14 can you have a look at this comment if there are any issues with it working well with QAT workflow with FakeQuantizeConfig.

Strange that we have IntxWeightOnly and Int4WeightOnly

yeah I feel we should probably merge these two

jerryzh168 · 2025-04-08T01:39:18Z

torchao/experimental/tests/test_embedding_xbit_quantizer.py

            has_weight_zeros=has_weight_zeros,
        ).quantize(quantized_model_reference)
        quantize_(
            quantized_model_reference,
            Int8DynamicActivationIntxWeightConfig(
                weight_dtype=weight_dtype,
-                granularity=granularity,
+                granularity=PerRow(),


should this be PerAxis as well

It can't be because that's controlled by Int8DynamicActivationIntxWeightConfig, which uses PerRow until #1968 lands

jerryzh168 · 2025-04-08T01:39:24Z

torchao/experimental/tests/test_embedding_xbit_quantizer.py

@@ -155,7 +154,7 @@ def test_shared_embedding(self):
        quantized_model = copy.deepcopy(model)
        SharedEmbeddingQuantizer(
            weight_dtype=weight_dtype,
-            granularity=granularity,
+            granularity=PerRow(),


jerryzh168

looks good overall, just need to change PerRow to PerAxis(axis=0) as we discussed in meeting

…lyConfig (#1972) * up * up * up * up * up * up * up * up

kimishpatel · 2025-05-07T17:34:50Z

torchao/dtypes/affine_quantized_tensor_ops.py

@@ -263,6 +269,9 @@ def _(func, types, args, kwargs):

 @implements(torch.nn.functional.embedding)
 def _(func, types, args, kwargs):
+    if _embedding_q_dq_check(args, kwargs):
+        return _embedding_q_dq_impl(args, kwargs)
+


why does line 299 only dequantizes weight bu tnot actually run embedding op?

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 27, 2025

metascroy requested a review from jerryzh168 March 27, 2025 00:59

metascroy commented Apr 1, 2025

View reviewed changes

torchao/experimental/quant_api.py Outdated Show resolved Hide resolved

metascroy commented Apr 1, 2025

View reviewed changes

torchao/experimental/quant_api.py Outdated Show resolved Hide resolved

metascroy mentioned this pull request Apr 2, 2025

[torchao/ExecuTorch] move embedding quantization to torchao pytorch/executorch#9514

Open

metascroy added 3 commits April 5, 2025 19:13

up

0b350ab

up

47ea9bc

up

05eec5d

metascroy force-pushed the add-intx-weight-only-for-embedding branch from 2c3b9ac to 05eec5d Compare April 6, 2025 02:15

metascroy added 2 commits April 5, 2025 21:18

up

d295706

up

dc1e3b8

metascroy commented Apr 7, 2025

View reviewed changes

metascroy added 2 commits April 7, 2025 18:19

up

6073e55

up

5fd4635

jerryzh168 reviewed Apr 8, 2025

View reviewed changes

jerryzh168 approved these changes Apr 8, 2025

View reviewed changes

up

186f903

metascroy merged commit 9516764 into main Apr 8, 2025
17 of 18 checks passed

jainapurva pushed a commit that referenced this pull request Apr 8, 2025

Adds Q/DQ layout support for embedding quantization with IntxWeightOn…

da111e4

…lyConfig (#1972) * up * up * up * up * up * up * up * up

kimishpatel reviewed May 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adds Q/DQ layout support for embedding quantization with IntxWeightOnlyConfig #1972

Adds Q/DQ layout support for embedding quantization with IntxWeightOnlyConfig #1972

Uh oh!

metascroy commented Mar 27, 2025

Uh oh!

pytorch-bot bot commented Mar 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

metascroy Apr 7, 2025

Uh oh!

kimishpatel May 7, 2025

Uh oh!

jerryzh168 May 7, 2025

Uh oh!

jerryzh168 Apr 8, 2025

Uh oh!

metascroy Apr 8, 2025

Uh oh!

jerryzh168 Apr 8, 2025

Uh oh!

jerryzh168 left a comment

Uh oh!

Uh oh!

kimishpatel May 7, 2025

Uh oh!

Uh oh!

Adds Q/DQ layout support for embedding quantization with IntxWeightOnlyConfig #1972

Adds Q/DQ layout support for embedding quantization with IntxWeightOnlyConfig #1972

Uh oh!

Conversation

metascroy commented Mar 27, 2025

Uh oh!

pytorch-bot bot commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1972

❌ 1 New Failure

Uh oh!

Uh oh!

Uh oh!

metascroy Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

kimishpatel May 7, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 May 7, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

metascroy Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kimishpatel May 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 27, 2025 •

edited

Loading