Torchvision dataset mirrors

### 🚀 The feature

Is it possible for pytorch/torchvision to mirror all the datasets on their own domain/hosts instead of downloading from the original researcher's web page/URL?


### Motivation, pitch

More often than not I run into problems when downloading them. For example:
1. Too many downloads
2. Bandwidth limit exceeded for the day
3. Some other outage such as in https://github.com/pytorch/vision/issues/7545

Also when running a Kaggle notebook, it re-downloads every time since there's no way to cache the downloaded dataset.

This will allow the problems above (and more) to go away.

More often than not people work around these issues by using some existing dataset that people have uploaded to Kaggle and defining their own Dataset class to read from that dataset. Alternatively, people may use some "hacks" to make torchvision use an existing Kaggle dataset that isn't in the directory format (name) that torchvision expects. See https://www.kaggle.com/code/dhruv4930/starter-for-oxford-iiit-pet-using-torchvision for an example.

Code copied below.

```python
# Oxford IIIT Pets Segmentation dataset loaded via torchvision.
!rm -f '/kaggle/working/oxford-iiit-pet'
!ln -s '/kaggle/input/oxfordiiitpetfromxijiatao/Oxford-IIT-Pet' '/kaggle/working/oxford-iiit-pet'

oxford_pets_path = '/kaggle/working'
pets_train_orig = torchvision.datasets.OxfordIIITPet(root=oxford_pets_path, split="trainval", target_types="segmentation", download=False)
pets_test_orig = torchvision.datasets.OxfordIIITPet(root=oxford_pets_path, split="test", target_types="segmentation", download=False)
```


### Alternatives

Since I'm personally interested in solving my local problem for Kaggle notebooks, a viable alternative would be to create a Kaggle dataset for every torchvision dataset so that when I use it in Kaggle, I just include it - also using a Kaggle dataset is more reliable in Kaggle notebooks.

However, this is a myopic view of the problem and provides a localized solution to a localized problem. I'm pretty sure that others outside of the narrow scope of a Kaggle notebook have experienced this issue and the previously suggested solution of mirroring the datasets would be more wholistic in terms of being more broad looking.

I'm open to other solutions that work across environments.


### Additional context

Thanks for working on torchvision - it's saved me a lot of time on mundane and vision specific tasks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Torchvision dataset mirrors #7637

🚀 The feature

Motivation, pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Torchvision dataset mirrors #7637

Description

🚀 The feature

Motivation, pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions