Meta AI and Stanford have introduced Apollo, a family of video-based large multimodal models (LMMs) designed to efficiently and accurately understand video content.
Apollo models excel at video tasks by addressing key challenges, including how videos are sampled, encoded, and trained.
Apollo-3B outperforms larger 7B models with a score of 68.7 on the MLVU benchmark, while Apollo-7B achieves 70.9, surpassing some 30B models.
Apollo marks a significant leap in video AI, opening doors to applications like content analysis and autonomous systems.