You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According the MX paper, and macore generally the large quantization literature, there is advantages to use different bitwidth for weights, activations and gradients. It would be very useful in mx_mm and MXLinear to support this more general setting.