[Feature]: BitsandBytes quantization with TP>1

### 🚀 The feature, motivation and pitch

Any QLoRA adapters trained on large checkpoints (e.g., 70B) are unusable as we cannot use TP>1 to shard the model over multiple GPUs. Therefore, resolving this would enable models that were trained with quantization, rather than having to rely on GPTQ and AWQ, which are applied post-hoc after training. 



### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: BitsandBytes quantization with TP>1 #8197

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: BitsandBytes quantization with TP>1 #8197

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions