Support mixed MX element dtype in `mx_mm` function and `MXLinear`. #1667

balancap · 2025-02-05T14:23:19Z

Following the MXFP and quantization literature, it is useful to support different element dtypes for activations, weights and gradients. This PR is simply adding a more general interface to mx_mm. A similar choice could be done with MXLinear

General issue: #1666

Following the MXFP and quantization literature, it is useful to support different element dtypes for activations, weights and gradients.

pytorch-bot · 2025-02-05T14:23:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1667

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit cbf6f0a with merge base 8afd10e ():

NEW FAILURE - The following job has failed:

Run Float8 Tests / test (H100, linux.aws.h100, --pre torch torchvision torchaudio --index-url https://download.pytor... / linux-job (gh)
RuntimeError: Command

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo · 2025-02-05T15:38:57Z

torchao/prototype/mx_formats/mx_linear.py

@@ -23,25 +23,31 @@ class mx_mm(torch.autograd.Function):
    # 1.       input @ weight_t    = output     (forward pass)
    # 2. grad_output @ weight      = grad_input (backward pass)
    # 3.     input_t @ grad_output = grad_weight (backward pass)
+    # 
+    # input, weight and grad_output have each their own MX element dtype.


nit: "can have"?

vkuzo · 2025-02-05T15:42:45Z

this makes sense, it would be great to cover with a test

the easiest place to test it would be here (

ao/test/prototype/mx_formats/test_mx_linear.py

Line 47 in 8afd10e

def test_linear_eager(elem_dtype, bias, input_shape):

), and that requires adding this to MXLinear. Would you be interested in doing that in this PR?

by the way, pytorch/pytorch#146414 outlines bringing MX dtypes to PyTorch core, and we plan to evolve torchao/prototype/mx_formats/ accordingly

…er factory method. Passing a tuple of 3 element dtypes avoids introducing a breaking change in the current interface of `MXLinear` and `swap_linear_with_mx_linear`. Some additional unit test coverage has been added on MXLinear.

balancap · 2025-02-05T17:04:00Z

I added the support of this feature in MXLinear too. In order to avoid breaking the interface (and keeping things simple in the single dtype case), you can now pass either a single element dtype or a tuple of 3.

I expanded the coverage in the test you mentioned (plus a small test on the factory side to check the 2 cases above are working properly).

Thanks for the link on PyTorch MX plan 👍 I would assume that the MX "simulated" mode is going to stay in TorchAO for some time, as it is very useful for testing + getting ready for MX hardware until it is widely available.

vkuzo · 2025-02-05T17:11:24Z

torchao/prototype/mx_formats/mx_linear.py

    """

    @classmethod
    @torch.no_grad()
    def from_float(cls, mod, elem_dtype, block_size):
        mod.__class__ = MXLinear
-        mod.elem_dtype = elem_dtype
+        # Single element dtype passed for input, weight and gradient.


nit: can we do

def from_float( ..., elem_dtype, ..., elem_dtype_weight_override=None, elem_dtype_grad_output_override=None, ... ): ...

we plan to create a proper config object for this in the future, but for now would be good to keep things simple and avoid mixing types in the API (such as dtype vs tuple)

Should I then enforce named argument in MXLinear.from_float and swap_linear_with_mx_linear for block_size and filter_fn? And have a default block_size=32?

sounds reasonable!

Just pushed a fix commit with:

def from_float( cls, mod, elem_dtype, elem_dtype_weight_override=None, elem_dtype_grad_output_override=None, *, block_size=32, ):

and similarly for swap_linear_with_mx_linear

vkuzo · 2025-02-05T17:12:49Z

I would assume that the MX "simulated" mode is going to stay in TorchAO for some time, as it is very useful for testing + getting ready for MX hardware until it is widely available.

yep! great to hear this is useful.

vkuzo

looks good, thank you! Please feel free to merge if CI is green.

note: we will likely change the UX of this workflow in the near future (add a top-level config, etc) as we add Blackwell support, we'll make sure to keep these options in the new UX!

balancap · 2025-02-06T17:04:18Z

@vkuzo Unfortunately, the H100 Float8 test runner seems to have had an issue starting.

vkuzo · 2025-02-06T18:20:20Z

failure is transient, I merged it. Thank you!

Support mixed MX element dtype in mx_mm function.

6358a71

Following the MXFP and quantization literature, it is useful to support different element dtypes for activations, weights and gradients.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 5, 2025

balancap mentioned this pull request Feb 5, 2025

[MX] Support mixed MXFP4/FP6/FP8 linear layer #1666

Closed

vkuzo reviewed Feb 5, 2025

View reviewed changes

vkuzo added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Feb 5, 2025

balancap changed the title ~~Support mixed MX element dtype in mx_mm function.~~ Support mixed MX element dtype in mx_mm function and MXLinear. Feb 5, 2025

vkuzo reviewed Feb 5, 2025

View reviewed changes

Using default elem_dtype argument and optional weight/grad overrides.

cbf6f0a

vkuzo approved these changes Feb 6, 2025

View reviewed changes

vkuzo merged commit 1d75c8f into pytorch:main Feb 6, 2025
16 of 17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support mixed MX element dtype in `mx_mm` function and `MXLinear`. #1667

Support mixed MX element dtype in `mx_mm` function and `MXLinear`. #1667

Uh oh!

balancap commented Feb 5, 2025

Uh oh!

pytorch-bot bot commented Feb 5, 2025 •

edited

Loading

Uh oh!

vkuzo Feb 5, 2025

Uh oh!

balancap Feb 5, 2025

Uh oh!

vkuzo commented Feb 5, 2025

Uh oh!

balancap commented Feb 5, 2025

Uh oh!

vkuzo Feb 5, 2025 •

edited

Loading

Uh oh!

balancap Feb 5, 2025

Uh oh!

vkuzo Feb 5, 2025

Uh oh!

balancap Feb 5, 2025

Uh oh!

vkuzo commented Feb 5, 2025

Uh oh!

vkuzo left a comment

Uh oh!

balancap commented Feb 6, 2025

Uh oh!

Uh oh!

vkuzo commented Feb 6, 2025

Uh oh!

Uh oh!

Support mixed MX element dtype in mx_mm function and MXLinear. #1667

Support mixed MX element dtype in mx_mm function and MXLinear. #1667

Uh oh!

Conversation

balancap commented Feb 5, 2025

Uh oh!

pytorch-bot bot commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1667

❌ 1 New Failure

Uh oh!

vkuzo Feb 5, 2025

Choose a reason for hiding this comment

Uh oh!

balancap Feb 5, 2025

Choose a reason for hiding this comment

Uh oh!

vkuzo commented Feb 5, 2025

Uh oh!

balancap commented Feb 5, 2025

Uh oh!

vkuzo Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

balancap Feb 5, 2025

Choose a reason for hiding this comment

Uh oh!

vkuzo Feb 5, 2025

Choose a reason for hiding this comment

Uh oh!

balancap Feb 5, 2025

Choose a reason for hiding this comment

Uh oh!

vkuzo commented Feb 5, 2025

Uh oh!

vkuzo left a comment

Choose a reason for hiding this comment

Uh oh!

balancap commented Feb 6, 2025

Uh oh!

Uh oh!

vkuzo commented Feb 6, 2025

Uh oh!

Uh oh!

Support mixed MX element dtype in `mx_mm` function and `MXLinear`. #1667

Support mixed MX element dtype in `mx_mm` function and `MXLinear`. #1667

pytorch-bot bot commented Feb 5, 2025 •

edited

Loading

vkuzo Feb 5, 2025 •

edited

Loading