The AI Institute has introduced Theia vision foundation model for enabling robots to interpret and interact with their surroundings effectively by distilling the expertise of multiple off-the-shelf vision foundation models (VFMs) into a single model.
Combining the strengths of multiple VFMs, Theia generates a richer, unified visual representation that can be used to improve robot learning performance.
Theia places supervised learning into a single stage, which combines features from different VFMs and uses a unified decoder to generate separate outputs.
The AI Institute invited researchers and developers to explore Theia and further evaluate its capabilities to improve how robots learn and interpret their environments.
AI Institute researchers tested Theia on several robot platforms, including Boston Dynamics‘ Spot and a WidowX robot arm, with encouraging results.
Training Theia requires roughly 150 GPU hours on datasets like ImageNet, making it more computationally efficient than other models.
The new design based on visual-correspondence modeling drastically improves the robustness and efficiency of visual learning systems for robots.
The model was designed to provide visual vetting and prioritization of training data, interpretability and able to handle datasets of differing sizes and complexity.
Theia enables robots to adapt more quickly and effectively, improving the interpretation of their environments,
While still in the early stages of research, researchers expect significant improvements in robot visual learning capabilities to emerge from Theia in the years ahead.