-
Notifications
You must be signed in to change notification settings - Fork 6.5k
[Download] Smart downloading #512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
patil-suraj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool, lgtm!
| model_id = "hf-internal-testing/unet-pipeline-dummy" | ||
| with tempfile.TemporaryDirectory() as tmpdirname: | ||
| _ = DiffusionPipeline.from_pretrained(model_id, cache_dir=tmpdirname, force_download=True) | ||
| local_repo_name = "--".join(["models"] + model_id.split("/")) | ||
| snapshot_dir = os.path.join(tmpdirname, local_repo_name, "snapshots") | ||
| snapshot_dir = os.path.join(snapshot_dir, os.listdir(snapshot_dir)[0]) | ||
|
|
||
| # inspect all downloaded files to make sure that everything is included | ||
| assert os.path.isfile(os.path.join(snapshot_dir, DiffusionPipeline.config_name)) | ||
| assert os.path.isfile(os.path.join(snapshot_dir, CONFIG_NAME)) | ||
| assert os.path.isfile(os.path.join(snapshot_dir, SCHEDULER_CONFIG_NAME)) | ||
| assert os.path.isfile(os.path.join(snapshot_dir, WEIGHTS_NAME)) | ||
| assert os.path.isfile(os.path.join(snapshot_dir, "scheduler", SCHEDULER_CONFIG_NAME)) | ||
| assert os.path.isfile(os.path.join(snapshot_dir, "unet", WEIGHTS_NAME)) | ||
| assert os.path.isfile(os.path.join(snapshot_dir, "unet", WEIGHTS_NAME)) | ||
| # let's make sure the super large numpy file: | ||
| # https://huggingface.co/hf-internal-testing/unet-pipeline-dummy/blob/main/big_array.npy | ||
| # is not downloaded, but all the expected ones | ||
| assert not os.path.isfile(os.path.join(snapshot_dir, "big_array.npy")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great test
pcuenca
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
anton-l
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great workaround!
* [Download] Smart downloading * add test * finish test * update * make style
Improved snapshot downloading for
diffusersIt might happen more and more that a repo folder on the Hub contains more that the necessary diffusion model files.
E.g. Our pipelines should not load CompVis weights in a repo that has both CompVis weights and
diffusersweights, but that would currently be the case which means that bandwidth and local storage is wasted.This PR makes sure via an
allow_patternsthat only the relevant files are actually downloaded.🚨🚨🚨 Please make sure to update
huggingface_hubto0.9.1🚨🚨🚨See: #538