The company compared π0 to other robot foundation models and found it to be the best-performing model across all tasks. Despite the promising results, the company acknowledges that generalist robot policies are still in their infancy and there are challenges to overcome, including long-horizon reasoning and planning, autonomous self-improvement, robustness, and safety. Physical Intelligence is seeking collaborations with the robotics community to further refine the model and is also open to hiring new talent.
Key takeaways:
- The team at Physical Intelligence has developed a general-purpose robot foundation model called π0 (pi-zero), which is a step towards developing artificial physical intelligence. This model is trained on broad and diverse data and can follow various text instructions, control a variety of different robots, and can be fine-tuned to specialize in challenging application scenarios.
- π0 uses Internet-scale vision-language pre-pretraining, open-source robot manipulation datasets, and the team's own datasets consisting of dexterous tasks from 8 distinct robots. The model can perform a wide variety of tasks, including folding laundry, bussing tables, and assembling boxes.
- The model inherits semantic knowledge and visual understanding from Internet-scale pretraining by starting from a pre-trained vision-language model (VLM). To enable high-frequency dexterous control, the team developed a novel method to augment pre-trained VLMs with continuous action outputs via flow matching, a variant of diffusion models.
- The team at Physical Intelligence is looking to collaborate with companies scaling up data collection with robots deployed for real-world applications, who are looking to collaborate on autonomy. They are also hiring and encourage interested individuals to get in touch.