Meta V-JEPA 2 world model uses raw video to train robots

A naukri.com initiative

New

Meta V-JEP...

The Robot Report

188

Meta introduced V-JEPA 2, a 1.2-billion-parameter world model trained on video for robotic systems.
V-JEPA 2 aids robots in understanding, prediction, and planning tasks with limited training data.
The model goes through a two-stage training process without human annotation, learning from over 1 million hours of video.
Meta tested V-JEPA 2 on robots in its labs, performing well on tasks like pick-and-place.
The model uses vision-based goal representations and visual subgoals for complex tasks.
In tests, V-JEPA 2 showed promising ability to generalize to new environments, with success rates of 65-80%.
Despite improvements, Meta notes a gap between model and human performance.
Meta suggests the need for models operating across timescales and modalities like audio or tactile information.
Meta releases benchmarks to evaluate models' physical understanding from video.
V-JEPA 2 code and model checkpoints are available for commercial and research use to promote exploration in robotics and AI.
Other tech companies like Google DeepMind and World Labs are also developing their own world models.
Google DeepMind's Genie simulates 3D environments, while World Labs raised $230 million for world model development.

Read Full Article

11 Likes

For uninterrupted reading, download the app