MAWS

[Paper] [Colab] [BibTex] [Website]

Repository for the strong foundational MAWS + MAE models at all sizes ranging from <100M parameters to >6.5B parameters, from the paper The effectiveness of MAE pre-pretraining for billion-scale pretraining. Models are available for both MAE pre-pretraining and the follow up WSP pretraining, MAE→WSP a.k.a. MAWS (Masked Autoencoding → Weakly Supervised pretraining).

Getting started

To get started with playing with our models immediately, we have a notebook available to play with on Colab, or locally for running our models in zero-shot mode.

For building any of our models, select which model type you would like to build. We have models available for:

model_type="maws": MAWS (MAE→WSP) pretraining, i.e. MAE pre-pretraining followed by WSP pretraining. We also have ImageNet-1k finetuned weights for MAWS models using the same model type.
model_type="maws_clip": MAWS pretrained models along with LiT aligned text encoders for CLIP style zero shot classification
model_type="mae": MAE pretrained models
model_type="mae_in1k": MAE pretrained on ImageNet-1k models

To access a model, specify the model architecture and the model type:

from maws.model_builder import build_model

# build a MAWS model with CLIP capabilities (via an aligned text encoder)
clip_model = build_model("vit_b16_xlmr_b", "maws_clip")

# build a MAWS model
maws_model = build_model("vit_b16", "maws")

# build a MAWS model finetuned on IN1k
maws_in1k_model = build_model("vit_b16_ft_in1k", "maws")

# build an MAE model
mae_model = build_model("vit_b16", "mae")

The models are also available via torch.hub:

# build a MAWS model with CLIP capabilities (via an aligned text encoder)
clip_model = torch.hub.load("facebookresearch/maws", model="vit_b16_xlmr_b_maws_clip")

# build a MAWS model
maws_model = torch.hub.load("facebookresearch/maws", model="vit_b16_maws")

# build a MAWS model finetuned on IN1k
maws_model = torch.hub.load("facebookresearch/maws", model="vit_b16_ft_in1k_maws")

# build an MAE model
mae_model = torch.hub.load("facebookresearch/maws", model="vit_b16_mae")

We list down all the available models and direct download links in the following section.

Installation instructions

conda create --name maws python=3.10
conda activate maws
pip install torch torchvision torchtext
pip install timm==0.9.7
# for demo
pip install jupyter ipywidgets matplotlib

WARNING

Torchtext has been deprecated which has broken clip model support. If you run without torchtext, all models which aren't clip based will work fine!

Available models

MAWS pretrained models

Model	Pretrained name + weights	IN1k 224px linear top-1	IN1k 512/518px finetuned name + weights	IN1k 512/518px finetuned top-1	Text encoder	0-Shot name + weights	IN1k 224px 0-shot top-1
ViT-B	vit_b16	83.3	vit_b16_ft_in1k	86.8	XLMR-B	vit_b16_xlmr_b	74.9
ViT-L	vit_l16	86.1	vit_l16_ft_in1k	88.8	XLMR-L	vit_l16_xlmr_l	79.7
ViT-H	vit_h14	87.5	vit_h14_ft_in1k	89.5	XLMR-L	vit_h14_xlmr_l	81.1
ViT-2B	vit_2b14	88.1	vit_2b14_ft_in1k	89.8	XLMR-L	vit_2b14_xlmr_l	82.1
ViT-6.5B	vit_6.5b14	88.6	vit_6.5b14_ft_in1k	90.1

MAE pretrained models

Model	Pretrained name + weights	IN1k 224px finetuned top-1
ViT-B	vit_b16	83.5
ViT-L	vit_l16	86.1
ViT-H	vit_h14	87.4
ViT-2B	vit_2b14	87.8
ViT-6.5B	vit_6.5b14	88.3

MAE pretrained on ImageNet-1k

Model	Pretrained name + weights	IN1k 224px finetuned top-1
ViT-2B	vit_2b14	87.4

MAE pretrained on ImageNet-21k

Model	Model name + weights	IN1k 512px finetuned
ViT-L	vit_l16	86.9

Evaluation on ImageNet-1k

Finetuned

We share weights for the MAWS models finetuned on ImageNet-1k at high resolution (512px for ViT-B, ViT-L and 518px for ViT-H, ViT-2B, ViT-6.5B). $IN1K_VAL_PATH should be the path to the ImageNet-1k val root folder.

python eval_finetuned.py -m vit_b16_ft_in1k -i 512 -b 25 -p $IN1K_VAL_PATH
# ImageNet-1k top-1 accuracy: 86.832

python eval_finetuned.py -m vit_l16_ft_in1k -i 512 -b 10 -p $IN1K_VAL_PATH
# ImageNet-1k top-1 accuracy: 88.796

python eval_finetuned.py -m vit_h14_ft_in1k -i 518 -b 5 -p $IN1K_VAL_PATH
# ImageNet-1k top-1 accuracy: 89.502

python eval_finetuned.py -m vit_2b14_ft_in1k -i 518 -b 5 -p $IN1K_VAL_PATH
# ImageNet-1k top-1 accuracy: 89.752

python eval_finetuned.py -m vit_6.5b14_ft_in1k -i 518 -b 5 -p $IN1K_VAL_PATH
# ImageNet-1k top-1 accuracy: 90.064

Zero-shot

Please refer to all the available model names in the MAWS Pretrained models section. $IN1K_VAL_PATH should be the path to the ImageNet-1k val root folder.

python eval_zeroshot.py -m vit_b16_xlmr_b -b 25 -p $IN1K_VAL_PATH
# Zero shot ImageNet-1k top-1 accuracy: 74.888

# Trying the french language instead with a larger model on a 32GB V100
python eval_zeroshot.py -m vit_2b14_xlmr_l --language french -b 5 -p $IN1K_VAL_PATH
# Zero shot ImageNet-1k top-1 accuracy: 62.622

Citation

If you use our models or if the work is useful in your research, please give us a star and cite:

@inproceedings{singh2023effectiveness,
    title={The effectiveness of MAE pre-pretraining for billion-scale pretraining},
    author={Singh, Mannat and Duval, Quentin and Alwala, Kalyan Vasudev and Fan, Haoqi and Aggarwal, Vaibhav and Adcock, Aaron and Joulin, Armand and Doll{\'a}r, Piotr and Feichtenhofer, Christoph and Girshick, Ross and Girdhar, Rohit and Misra, Ishan},
    booktitle={ICCV},
    year={2023}
}

License

Our models are released under the CC-BY-NC 4.0 license. See LICENSE for additional details.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
docs		docs
maws		maws
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
clip_example.ipynb		clip_example.ipynb
eval_finetuned.py		eval_finetuned.py
eval_zeroshot.py		eval_zeroshot.py
hubconf.py		hubconf.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MAWS

Getting started

Installation instructions

WARNING

Available models

MAWS pretrained models

MAE pretrained models

MAE pretrained on ImageNet-1k

MAE pretrained on ImageNet-21k

Evaluation on ImageNet-1k

Finetuned

Zero-shot

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

facebookresearch/maws

Folders and files

Latest commit

History

Repository files navigation

MAWS

Getting started

Installation instructions

WARNING

Available models

MAWS pretrained models

MAE pretrained models

MAE pretrained on ImageNet-1k

MAE pretrained on ImageNet-21k

Evaluation on ImageNet-1k

Finetuned

Zero-shot

Citation

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages