Tensor Parallelism Support for AffineQuantizedTensor

Recently we landed https://github.com/pytorch/ao/pull/939 to support tensor parallelism for int8 weight only quantization, another example: https://github.com/pytorch/ao/pull/785

now we can support tensor parallelism for other types of quantization as well.

* [x] float8 weight only @jainapurva - #1003 
* [x] float8 dynamic activation @jainapurva - #1078 
* [ ] uintx weight only @melvinebenezer 
* [x] int4 weight only quant - @jerryzh168 #1120
* [x] int8 dynamic act + int8 weight - @jainapurva https://github.com/pytorch/ao/pull/1657
* [ ] fpx - 

# Steps
## 1. Create test
Since we don't have many tests today, we can optimize for readability for now, so we can copy paste the test cases to a https://github.com/pytorch/ao/blob/main/test/dtypes/test_affine_quantized_tensor_parallel.py instead of inheriting from these test cases

For new tests you can follow https://github.com/pytorch/ao/blob/c87cc9b7286a46e9dfc076fa2417eb9b64ccc807/test/dtypes/test_affine_quantized_tensor_parallel.py#L133-L153 to create your own test case

## 2. Run the test 

python test/dtypes/test_affine_quantized_tensor_parallel.py

## 3. Add support for missing ops until test passes
We'd expect people to add some slicing ops etc. to the corresponding TensorImpl tensor subclass

	class TestFloat8dqTensorAffineQuantizedTensorParallel(TestFloat8dqAffineQuantizedTensorParallel):
	QUANT_METHOD_FN = staticmethod(float8_dynamic_activation_float8_weight)
	QUANT_METHOD_KWARGS = {"granularity": PerTensor()}
	COMMON_DTYPES = [torch.bfloat16, torch.float16, torch.float32]

	@common_utils.parametrize("dtype", COMMON_DTYPES)
	@with_comms
	@unittest.skipIf(not torch.cuda.is_available(), "Need CUDA available")
	def test_tp(self, dtype):
	return self._test_tp(dtype)

	class TestFloat8dqRowAffineQuantizedTensorParallel(TestFloat8dqAffineQuantizedTensorParallel):
	QUANT_METHOD_FN = staticmethod(float8_dynamic_activation_float8_weight)
	QUANT_METHOD_KWARGS = {"granularity": PerRow()}
	COMMON_DTYPES = [torch.bfloat16]

	@common_utils.parametrize("dtype", COMMON_DTYPES)
	@with_comms
	@unittest.skipIf(not torch.cuda.is_available(), "Need CUDA available")
	def test_tp(self, dtype):
	return self._test_tp(dtype)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tensor Parallelism Support for AffineQuantizedTensor #988

Steps

1. Create test

2. Run the test

3. Add support for missing ops until test passes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tensor Parallelism Support for AffineQuantizedTensor #988

Description

Steps

1. Create test

2. Run the test

3. Add support for missing ops until test passes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions