Facebook has introduced V-JEPA 2, a world model trained on video to help AI agents understand the physical world and predict outcomes of actions.
V-JEPA 2 is a step towards developing advanced machine intelligence (AMI) by enabling AI agents to think before they act.
Humans predict how the world reacts to actions by observing and internalizing a model of the world.
V-JEPA 2 mimics human intelligence, helping AI agents become smarter in understanding the physical world.
World models like V-JEPA 2 provide AI agents with three key capabilities: understanding, predicting, and planning.
V-JEPA 2 improves upon the previous model, V-JEPA, by enhancing understanding and predicting abilities for robots interacting with new objects and environments.
The model was trained using video data, learning patterns of interaction and movement in the physical world.
Robots in labs demonstrated tasks like reaching, picking up objects, and relocating them using V-JEPA 2.
Facebook also introduced three new benchmarks to help researchers evaluate how well their models understand and reason using video data.
The goal is to provide researchers with better models and benchmarks to enhance AI systems and positively impact people's lives.