Meta AI has released Apollo, a family of video-focused Large Multimodal Models (LMMs) designed to enhance video understanding.
Apollo addresses challenges in video-based models through efficient design choices, such as frame-per-second (fps) sampling and dual vision encoders.
The Apollo models come in three sizes - 1.5B, 3B, and 7B parameters - offering flexibility for different computational constraints and real-world needs.
Apollo achieves strong performance on video-language tasks and introduces innovations like scaling consistency and token resampling.