<ul data-eligibleForWebStory="true"><li>Masked Autoencoders (MAE) scale vision understanding by hiding most of the input image.</li><li>MAE randomly masks 75% of image for model to reconstruct missing pixels efficiently.</li><li>MAE introduces asymmetric setup with strong encoder and small decoder for efficient training.</li><li>MAE's pretraining helps massive models like ViT-H generalize well with ImageNet-1K.</li></ul>

Masked Autoencoders (MAE): The Art of Seeing More by Masking Most & PyTorch Implementation

Discover more