<ul><li>Meta AI introduces Perception Encoder (PE), a vision model family trained using a single contrastive vision-language objective and refined with alignment techniques tailored for downstream tasks.</li><li>PE operates across three scales—PEcoreB, PEcoreL, and PEcoreG—with the largest (G-scale) model containing 2B parameters, functioning as a general-purpose encoder for image and video inputs.</li><li>PE demonstrates strong zero-shot generalization across a wide range of vision benchmarks, achieving competitive results on image classification and fine-grained datasets, as well as state-of-the-art performance on video tasks.</li><li>The release of PE, alongside its codebase and the PE Video Dataset, provides a foundation for building multimodal AI systems and advancing integrated and robust visual understanding.</li></ul>

Meta AI Introduces Perception Encoder: A Large-Scale Vision Encoder that Excels Across Several Vision Tasks for Images and Video

Discover more