JARVIS-1 can self-improve following a life-long learning paradigm, thanks to its growing multimodal memory. This feature sparks more general intelligence and improved autonomy. The article demonstrates the performance of JARVIS-1 at different learning stages when completing the same task. It also shows that JARVIS-1 can execute human instructions in diverse environments. The article concludes by sharing additional results of JARVIS-1 on Minecraft and introducing some related projects.
Key takeaways:
- JARVIS-1 is an open-ended agent that can perceive multimodal input, generate sophisticated plans, and perform embodied control in the Minecraft universe. It is built on top of pre-trained multimodal language models and is equipped with a multimodal memory.
- The agent can self-improve following a life-long learning paradigm, demonstrating improved performance over time. For example, it learned to mine an extra log for fuel by the third epoch of a task.
- JARVIS-1 has shown nearly perfect performances across over 200 varying tasks in Minecraft, from entry to intermediate levels. It achieved a completion rate of 12.5% in the long-horizon diamond pickaxe task, a significant increase compared to previous records.
- The agent can execute human instructions in diverse environments, demonstrating its ability to adapt to different biomes in the Minecraft universe.