Skip to content

Adding native support to load GGUF models using transformers #38063

@sleepingcat4

Description

@sleepingcat4

Feature request

GGUF recently became a popular model format that can be used to load ridiculously large models in 2x3090 (48GB) cards. While loading the original models are still better and HF supports 4bit and 8bit but those quant aren't much practical for very large models unlike GGUF.

Llama CPP exists and its quite robust but it's not as flexible as native HF and it does not support batch inference. https://huggingface.co/docs/hub/en/gguf

Motivation

Actually same as what I described in feature request. I think GGUF is tremendous value and it should included in HF natively. HF supporting the format but doesn't allow it to be loaded using HF is kinda bad.

Your contribution

I can help with some of the support adding part in my free time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions