The Lumos architecture consists of a planning module that decomposes complex tasks into high-level subgoals, a grounding module that converts these subgoals into executable actions, and an execution module that interacts with external tools and environments. The article also discusses two Lumos formulations, Lumos-Iterative and Lumos-Onetime, and the use of Language Models (LLMs) to convert ground-truth intermediate reasoning steps into high-quality annotations. Lumos has shown superior performance compared to baseline formulations and has demonstrated its generalizability on unseen tasks.
Key takeaways:
- Lumos is a language agent that unifies a suite of complex interactive tasks and achieves competitive performance with GPT-4/3.5-based and larger open-source agents. It consists of planning, grounding, and execution modules.
- Lumos is trained with approximately 40K diverse high-quality subgoal/action annotations from ground-truth reasoning steps in existing benchmarks with GPT-4.
- Lumos outperforms GPT-4/3.5-based agents on complex QA and web tasks, and larger language agents on maths tasks. It also surpasses larger open LLM agents and domain-specific agents by a large margin on an unseen task, WebShop.
- The Lumos training annotations are one of the largest resources for language agent fine-tuning, covering web, complex QA and math task types. The annotations help achieve better performance than those produced by the Self-Instruct method and passed by rigorous execution sanity checking.