-
Notifications
You must be signed in to change notification settings - Fork 31.1k
Open
Labels
Feature requestRequest for a new featureRequest for a new feature
Description
Feature request
GGUF recently became a popular model format that can be used to load ridiculously large models in 2x3090 (48GB) cards. While loading the original models are still better and HF supports 4bit and 8bit but those quant aren't much practical for very large models unlike GGUF.
Llama CPP exists and its quite robust but it's not as flexible as native HF and it does not support batch inference. https://huggingface.co/docs/hub/en/gguf
Motivation
Actually same as what I described in feature request. I think GGUF is tremendous value and it should included in HF natively. HF supporting the format but doesn't allow it to be loaded using HF is kinda bad.
Your contribution
I can help with some of the support adding part in my free time.
zamazan4ik
Metadata
Metadata
Assignees
Labels
Feature requestRequest for a new featureRequest for a new feature