[Proposal] Support loading from safetensors if file is present. #1357

Narsil · 2022-11-21T16:36:33Z

Current proposed behavior mimics transformers.

Do not enforce safetensors.
If safetensors is present AND the file is present (on the remote model or locally).
the load from it preferrably.

HuggingFaceDocBuilderDev · 2022-11-21T16:42:22Z

The documentation is not available anymore as the PR was closed or merged.

patil-suraj

Thanks a lot for the PR @Narsil ! The code looks good, and I'm in favor of supporting safetensors. Is there any example repo with safetensors weights which we could try ?

Also, don't we need any changes to save_pretrained to save weights in this format ?

Narsil · 2022-11-21T17:52:58Z

Also, don't we need any changes to save_pretrained to save weights in this format ?

Currently, I didn't do any change.

pipeline.save_pretrained will call all modules methods on its own which will make it harder to use safetensors.
model.save_pretrained does support save_function which makes it easy to support safetensors (by users I mean)

However, I don't think you can override save_function within the pipeline.save_pretrained.

Overall I would be more conservative with saving weights as an opt-in rather than a default to keep things more gradual in changes (since saving in safetensors meaning users will have to have it to load the weights). Actually we could load the weights even without safetensors because the loading is super simple https://gist.github.com/Narsil/3edeec2669a5e94e4707aa0f901d2282
I'm just being conservative in changing patterns.

Thanks a lot for the PR @Narsil ! The code looks good, and I'm in favor of supporting safetensors. Is there any example repo with safetensors weights which we could try ?

Not yet, I'm in the process of updating https://github.com/huggingface/safetensors/blob/main/bindings/python/convert.py to support ANY framework by default (just convert ALL pytorch weights of the repo and just change the extension).

patrickvonplaten · 2022-11-22T10:46:22Z

Design also looks good to me! Thanks a lot for working on it @Narsil !

Narsil · 2022-11-22T15:09:32Z

@patil-suraj

https://huggingface.co/Narsil/stable-diffusion-v1-4/
https://huggingface.co/Narsil/tiny-stable-diffusion-torch

Are now converted.

Simply

git clone https://github.com/huggingface/safetensors
cd bindings/python
python convert.py MODEL_ID

Or simply this: https://huggingface.co/spaces/safetensors/convert

Narsil · 2022-11-22T15:56:15Z

Should I create a test for this ? (Could have a tiny random pipeline with only safetensors weights to test, but I would need to modify the CI somewhere to install it.)

patrickvonplaten · 2022-11-22T19:46:11Z

Should I create a test for this ? (Could have a tiny random pipeline with only safetensors weights to test, but I would need to modify the CI somewhere to install it.)

Yes a test would be great! Could you maybe just safetensors here:

diffusers/setup.py

Line 189 in 44e56de

"torchvision",

?

patrickvonplaten · 2022-11-22T19:46:29Z

It should then be used automatically be the Github Runner (cc @anton-l )

patil-suraj · 2022-11-23T12:54:55Z

The loading time is crazy fast, I'm getting ~1.3 sec on CPU 🔥

Narsil · 2022-11-23T13:34:46Z

The loading time is crazy fast, I'm getting ~1.3 sec on CPU fire

should be lower than that, I think I know why, I found a bug creating the test, I'm fixing everything right now.

Narsil · 2022-11-23T14:39:54Z

Okay I think you should check again the logic as I'm not confident about this change.

snapshot_download uses allow and ignore lists to get the snapshot in a ready to go fashion.

Since both .safetensors files and .bin files were accepted before it downloaded both versions and everything still worked. However, it was loading the Pytorch files for transformers (see huggingface/safetensors#105).

This modification does the fix, but adds an extra network call to fetch the model_info to filter out the files before snapshot_download. This is the smallest change I could think of, but maybe we can do even better (and remove the *.msgpack oddity at the same time.

However this feels like a larger change, which would require a separate PR.

The idea would be to remove all network calls when using _OFFLINE and make sure the CPU load happens in a few ms (currently sitting at 1.6s for me and my lousy network, and with offline options it goes down only to 1.2, 1.3s which is not normal)

Edit:

import torch
from diffusers import StableDiffusionPipeline

import datetime

start = datetime.datetime.now()
pipe = StableDiffusionPipeline.from_pretrained("Narsil/stable-diffusion-v1-4", torch_dtype=torch.float16).to("cuda")
print(f"Loaded in {datetime.datetime.now() - start}")
image = pipe("example prompt", num_inference_steps=2).images[0]

Narsil · 2022-11-23T16:20:08Z

Also if that's interesting I could open another PR to load the pipeline directly on CUDA, this will make a difference with safetensors (and the appropriate flag).

patrickvonplaten · 2022-11-26T19:08:09Z