Adding native support to load GGUF models using transformers

### Feature request

GGUF recently became a popular model format that can be used to load ridiculously large models in 2x3090 (48GB) cards. While loading the original models are still better and HF supports 4bit and 8bit but those quant aren't much practical for very large models unlike GGUF. 

Llama CPP exists and its quite robust but it's not as flexible as native HF and it does not support batch inference. https://huggingface.co/docs/hub/en/gguf

### Motivation

Actually same as what I described in feature request. I think GGUF is tremendous value and it should included in HF natively. HF supporting the format but doesn't allow it to be loaded using HF is kinda bad. 

### Your contribution

I can help with some of the support adding part in my free time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding native support to load GGUF models using transformers #38063

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Adding native support to load GGUF models using transformers #38063

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions