Plans for 8da4w quantization

Hi,

From #430, it seems that 8da4w is primarily for Executorch, and is set to be deprecated. Please advise if there are any plans to enable it for CUDA & CPU as well, such that int4 weights could be converted to int8 just before computation?

Thanks!

cc @jerryzh168