-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Closed as not planned
Labels
feature requestNew feature or requestNew feature or requeststaleOver 90 days of inactivityOver 90 days of inactivity
Description
🚀 The feature, motivation and pitch
Any QLoRA adapters trained on large checkpoints (e.g., 70B) are unusable as we cannot use TP>1 to shard the model over multiple GPUs. Therefore, resolving this would enable models that were trained with quantization, rather than having to rely on GPTQ and AWQ, which are applied post-hoc after training.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
copasseron, ChiaraTroiani and echo-yi
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or requeststaleOver 90 days of inactivityOver 90 days of inactivity