MIT researchers have developed a technique that combines data from many sources to teach any robot a vast range of tasks. This method requires fewer task-specific data and combines simulations and real-world data. This approach can be used to train robots quickly without the need to start training a robot from scratch. Their new technique aligns data from varied domains and multiple modalities into a shared 'language' that generative AI models process. Their architecture is called Heterogeneous Pretrained Transformers that unifies data from varied modalities and domains. Proprioception data is key to enable dexterous motions, and it is placed with the same importance as vision data in the architecture. HPT improved robot performance by more than 20% on simulation and real-world tasks compared with a robot trained from scratch. In the future, the researchers want to study how data diversity could boost the performance of HPT. They also want to enhance the HPT so it can process unlabeled data like GPT-4 and other large language models. Their dream is to have a universal robot brain, which anyone can use for their robot without any training.