[MX] Support mixed MXFP4/FP6/FP8 linear layer

Blackwell hardware natively supports any combination of MXFP4/FP6/FP8 in matmuls. See PTX and Cutlass documentation:
* https://docs.nvidia.com/cuda/parallel-thread-execution/#tcgen05-kind-shapes 
* https://github.com/NVIDIA/cutlass/blob/main/tools/library/src/reference/block_scaled_gemm_mixed8bitsa.cu

According the [MX paper](https://arxiv.org/abs/2310.10537), and macore generally the large quantization literature, there is advantages to use different bitwidth for weights, activations and gradients. It would be very useful in `mx_mm` and `MXLinear` to support this more general setting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MX] Support mixed MXFP4/FP6/FP8 linear layer #1666

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[MX] Support mixed MXFP4/FP6/FP8 linear layer #1666

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions