convert : parse safetensors directly #15667

compilade · 2025-08-29T16:46:47Z

Should fix #15623
(originally targeted #14810, but was rebased)

This replaces the approach from #8482 to avoid using get_slice because it turns out it eagerly memmaps tensors which means on Windows this uses a lot of memory, and on Linux this inflates the resident set size.

Safetensors files are now parsed directly, since the format is simple enough. This will also eventually allow tracking the file ranges of tensors to maybe use os.copy_file_range when possible to make conversion of COW filesystems very fast (in #15727).

On Linux, when using memray (a memory profiler), this change reduces the peak heap memory usage by quite a lot, and with GNU time, it also reduces the peak resident set size memory usage.

The previous behavior when observed with memray seems to be that safe_open puts all of the model into the heap (likely memmaped, though since the resident set size is smaller and grows). The new behavior when observed with memray is more similar to what I thought happened in the first place (bumps of memory usage at each processed tensor, but it goes back down between each).

Here's a table of the "Maximum resident set size (kbytes)" from time -v (when using GNU time) on a few models:

$ $(which time) -v python3 convert_hf_to_gguf.py /path/to/model_dir --outfile /path/to/model.gguf --outtype f16

Model	Target type	`master` (kbytes)	This PR (kbytes)
https://huggingface.co/mistralai/Mistral-7B-v0.1	F16	10 334 248	1 129 248
https://huggingface.co/meta-llama/Llama-3.2-1B	F16	3 023 112	2 104 256
https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct	F16	9 165 048	2 680 124

Safetensors are already directly parsed since #12820 for remote models. This is similar, but for local models.

TODO:

Handle byteswapping on big-endian platforms?
- The safetensors library automatically byteswaps when running on a big-endian platform (since the format is always little-endian), but GGUFWriter byteswaps unconditionnaly when the target endianness is big, so this never really worked anyway? (double-byteswapping in this case would produce little endian tensors...)

Make sure to read the contributing guidelines before submitting a PR

LostRuins · 2025-09-08T08:26:44Z

I can confirm that this helped me convert glm 4.5 air, whereas current main fails.

LostRuins · 2025-09-11T04:12:36Z

Is there anything preventing this PR from being merged/unset for draft?

It's impossible for me to convert GLM Air reliably without this PR so I think it's quite useful to have.

LostRuins · 2025-09-23T11:39:38Z

@whatever1983 this has nothing to do with fp8 conversion. This is simply a more memory efficient way of performing the GGUF convert that prevents OOMs/crashing during the conversion process, which I need in order to convert GLM Air.

As for politics I can't advise on that. I just want to successfully convert my models hence me bumping the issue.

Applies to both local and remote safetensors custom parsing. This matches the behavior of the official safetensors implementation. * convert : rename from_safetensors_meta to from_local_tensor For consistency with from_remote_tensor

github-actions bot added the python python script changes label Aug 29, 2025

This was referenced Aug 29, 2025

Misc. bug: convert_hf_to_gguf.py runs out of memory #15623

Closed

gguf-py: reduce peak RAM during convert by streaming dtype casts #15648

Open

compilade force-pushed the compilade/convert-safetensors-parse branch from 85edafe to 786b32d Compare September 1, 2025 14:18

compilade mentioned this pull request Sep 2, 2025

convert : use reflinks for faster conversion #15727

Draft

14 tasks

compilade force-pushed the compilade/convert-safetensors-parse branch from 786b32d to e582f1a Compare September 9, 2025 18:49

This comment was marked as off-topic.

Sign in to view

compilade mentioned this pull request Oct 23, 2025

convert : handle pre-quantized models #14810

Merged

2 tasks

compilade added 3 commits November 6, 2025 22:33

convert : parse safetensors directly

c4b630f

gguf-py : order safetensors tensors by name

e7b7ed8

Applies to both local and remote safetensors custom parsing. This matches the behavior of the official safetensors implementation. * convert : rename from_safetensors_meta to from_local_tensor For consistency with from_remote_tensor

convert : fix no-lazy dtypes from direct safetensors

e996f3a

compilade force-pushed the compilade/convert-safetensors-parse branch from e582f1a to e996f3a Compare November 7, 2025 03:39

compilade changed the base branch from compilade/convert-prequant to master November 7, 2025 03:39

DajanaV mentioned this pull request Nov 7, 2025

UPSTREAM PR #15667: convert : parse safetensors directly auroralabs-loci/llama.cpp#111

Open

1 task

compilade marked this pull request as ready for review November 7, 2025 04:11

compilade requested a review from CISC as a code owner November 7, 2025 04:11

CISC approved these changes Nov 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

convert : parse safetensors directly #15667

convert : parse safetensors directly #15667

compilade commented Aug 29, 2025 •

edited

Loading

Uh oh!

LostRuins commented Sep 8, 2025

Uh oh!

LostRuins commented Sep 11, 2025

Uh oh!

This comment was marked as off-topic.

LostRuins commented Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

convert : parse safetensors directly #15667

Are you sure you want to change the base?

convert : parse safetensors directly #15667

Conversation

compilade commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Sep 8, 2025

Uh oh!

LostRuins commented Sep 11, 2025

Uh oh!

This comment was marked as off-topic.

LostRuins commented Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

compilade commented Aug 29, 2025 •

edited

Loading