add the `torch.float8_e8m0fnu` dtype to PyTorch #147466

vkuzo · 2025-02-19T18:01:09Z

Summary:

Continuing the work from #146427

Adds the torch.float8_e8m0fnu dtype to PyTorch, as detailed in
#146414 . Please see the issue for a detailed definition of the format. Example of basic functionality:

import torch

# round trip
x0 = torch.randn(4, 4, dtype=torch.float32)
x1 = x0.to(torch.float8_e8m0fnu)  # RNE rounding
x2 = x1.to(torch.float32)  # 2 ** exponent

# creation with empty
x0 = torch.empty(4, 4, dtype=torch.float8_e8m0fnu)

# printing
print(x0)

Done in this PR:

numerical correctness
op coverage (except for torch._scaled_mm): create tensor, cast to/from float32
printing a tensor works

For future PRs:

performance optimizations for casting
torch._scaled_mm
PT2
various cleanups (detailed in comments with issue numbers)

Test Plan:

pytest test/quantization/core/experimental/test_float8.py -s

Reviewers:

Subscribers:

Tasks:

Tags:

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

Summary: Adds the `torch.float8_e8m0fnu` dtype to PyTorch, as detailed in #146414 Not ready for review yet. Test Plan: ``` pytest test/quantization/core/experimental/test_float8.py -s ``` Reviewers: Subscribers: Tasks: Tags: ghstack-comment-id: 2634707334

pytorch-bot · 2025-02-19T18:01:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/147466

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d635775 with merge base 303ad19 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo · 2025-02-19T23:52:42Z

@pytorchbot merge

pytorchmergebot · 2025-02-19T23:54:22Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-02-20T01:19:24Z

Merge failed

Reason: 1 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_9-cuda11_8-build / build

Details for Dev Infra team

Raised by workflow job

vkuzo · 2025-02-20T13:48:08Z

@pytorchbot merge

pytorchmergebot · 2025-02-20T13:49:55Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

henrylhtsang · 2025-02-20T22:03:17Z

false alarm my bad

…)" This reverts commit 382fbcc.

Summary: Continuing the work from pytorch#146427 Adds the `torch.float8_e8m0fnu` dtype to PyTorch, as detailed in pytorch#146414 . Please see the issue for a detailed definition of the format. Example of basic functionality: ```python import torch # round trip x0 = torch.randn(4, 4, dtype=torch.float32) x1 = x0.to(torch.float8_e8m0fnu) # RNE rounding x2 = x1.to(torch.float32) # 2 ** exponent # creation with empty x0 = torch.empty(4, 4, dtype=torch.float8_e8m0fnu) # printing print(x0) ``` Done in this PR: * numerical correctness * op coverage (except for `torch._scaled_mm`): create tensor, cast to/from float32 * printing a tensor works For future PRs: * performance optimizations for casting * torch._scaled_mm * PT2 * various cleanups (detailed in comments with issue numbers) Test Plan: ``` pytest test/quantization/core/experimental/test_float8.py -s ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: pytorch#147466 Approved by: https://github.com/drisspg

yiakwy-xpu-ml-framework-team · 2025-04-23T11:11:17Z

Hi @vkuzo are you still working on the problem ? I think the way you used MXFP8_E8M0_FNU could be discussed.

The number is used inside group_quantize function each warp extract 32 fp3_exponent from 32 fp32. So the ocp_fp8e8m0fnu_from_fp32 is simply equal to extract fp32 exponent.

No mantssa and speical number should be taken careful because exponent itself is unsigned. We don't need to care about it.

Fp32 -> fp8 + mxfp8_e8m0_fnu

The second problem is that this mxfp8_e8m0_fnu will be shared by a group consecutive 32 (group) elements (no exponent, only mantissa w/wo implict-1). (because mantissa multiply the exponent is the fp32).

Note fp8 scalar multiple a fp8 data type is as easy as fp32 = fp8_scale << fp32_t::M | (fp8 & fp32_t::INT32_M_MASK));

Here is my implementation :

template<>
HOST_DEVICE_INLINE OutType ocp_fp8e8m0fnu_from_fp32(float fval) {
    using fp32_t = Float;
    using fp8_t = Float8_E8M0_FNU;
    using fp8_storage_t = Float8_E8M0_FNU::Datum;

    union {
        float fval;
        int32_t i32val;
        uint32_t ui32val;
    } val;

    val.fval = fval;

    fp8_storage_t ui8val = (val.i32val & fp32_t::INT32_E_MASK) >> fp32_t::M;
    return fp8_t::from_bits(ui8val.ui8val);
}

vkuzo · 2025-05-23T13:07:14Z

hi @yiakwy-xpu-ml-framework-team , sorry for late reply, I was on a long leave and am now catching up on what I missed.

The number is used inside group_quantize function each warp extract 32 fp3_exponent from 32 fp32. So the ocp_fp8e8m0fnu_from_fp32 is simply equal to extract fp32 exponent.

Correct! Please check out #146414, "E8M0 detailed proposal" for details. The default cast to e8m0 in PyTorch is using RNE to match the IEEE-754 spec and other floating point dtypes, which does not match what is described in the OCP spec. It's up to the user to specify a different casting/rounding behavior (such as floor) if they would like to do so - this is 100% valid.

vkuzo requested review from eqy, syed-ahmed, mikaylagawarecki and jerryzh168 as code owners February 19, 2025 18:01

pytorch-bot bot added module: cpu CPU specific problem (e.g., perf, algorithm) release notes: quantization release notes category labels Feb 19, 2025

This was referenced Feb 19, 2025

add the torch.float8_e8m0fnu` dtype to PyTorch #147462

Closed

add the torch.float8_e8m0fnu dtype to PyTorch #146427

Closed

vkuzo changed the title ~~add the torch.float8_e8m0fnu` dtype to PyTorch~~ add the torch.float8_e8m0fnu dtype to PyTorch Feb 19, 2025

drisspg approved these changes Feb 19, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 19, 2025

pytorchmergebot added the merging label Feb 19, 2025

pytorchmergebot removed the merging label Feb 20, 2025

pytorchmergebot added the merging label Feb 20, 2025

pytorchmergebot added the Merged label Feb 20, 2025

pytorchmergebot closed this in 382fbcc Feb 20, 2025

pytorchmergebot removed the merging label Feb 20, 2025

This was referenced Feb 24, 2025

MX single node performance tracker pytorch/ao#1768

Open

Microscaling dtypes in triton? triton-lang/triton#6054

Open

mengfei25 added a commit to mengfei25/pytorch that referenced this pull request Mar 6, 2025

Revert "add the torch.float8_e8m0fnu dtype to PyTorch (pytorch#147466…

0cb1226

…)" This reverts commit 382fbcc.

jianyizh added a commit to jianyizh/pytorch that referenced this pull request Mar 6, 2025

Revert "add the torch.float8_e8m0fnu dtype to PyTorch (pytorch#147466…

5b88e6e

…)" This reverts commit 382fbcc.

github-actions bot deleted the 20250219_e8m0_intermediate branch March 27, 2025 02:11

vkuzo mentioned this pull request May 23, 2025

MX basic dtypes in pytorch/pytorch #146414

Open

avizon-aws mentioned this pull request Aug 16, 2025

Creation of MXFP8 Tensor fails with AttributeError: module 'torch' has no attribute 'float8_e8m0fnu' pytorch/ao#2783

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add the `torch.float8_e8m0fnu` dtype to PyTorch #147466

add the `torch.float8_e8m0fnu` dtype to PyTorch #147466

Uh oh!

vkuzo commented Feb 19, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 19, 2025 •

edited

Loading

Uh oh!

vkuzo commented Feb 19, 2025

Uh oh!

pytorchmergebot commented Feb 19, 2025

Uh oh!

pytorchmergebot commented Feb 20, 2025

Uh oh!

vkuzo commented Feb 20, 2025

Uh oh!

pytorchmergebot commented Feb 20, 2025

Uh oh!

henrylhtsang commented Feb 20, 2025 •

edited

Loading

Uh oh!

yiakwy-xpu-ml-framework-team commented Apr 23, 2025 •

edited

Loading

Uh oh!

vkuzo commented May 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

add the torch.float8_e8m0fnu dtype to PyTorch #147466

add the torch.float8_e8m0fnu dtype to PyTorch #147466

Uh oh!

Conversation

vkuzo commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/147466

✅ No Failures

Uh oh!

vkuzo commented Feb 19, 2025

Uh oh!

pytorchmergebot commented Feb 19, 2025

Merge started

Uh oh!

pytorchmergebot commented Feb 20, 2025

Merge failed

Uh oh!

vkuzo commented Feb 20, 2025

Uh oh!

pytorchmergebot commented Feb 20, 2025

Merge started

Uh oh!

henrylhtsang commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiakwy-xpu-ml-framework-team commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkuzo commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

add the `torch.float8_e8m0fnu` dtype to PyTorch #147466

add the `torch.float8_e8m0fnu` dtype to PyTorch #147466

vkuzo commented Feb 19, 2025 •

edited

Loading

pytorch-bot bot commented Feb 19, 2025 •

edited

Loading

henrylhtsang commented Feb 20, 2025 •

edited

Loading

yiakwy-xpu-ml-framework-team commented Apr 23, 2025 •

edited

Loading

vkuzo commented May 23, 2025 •

edited

Loading