WebRL has been applied to transform open Llama-3.1 and GLM-4 models into proficient web agents, significantly improving their success rates on WebArena-Lite. The open models trained with WebRL outperformed GPT-4-Turbo and GPT-4o, as well as previous state-of-the-art web agents trained on open LLMs. The study demonstrates WebRL's effectiveness in bridging the gap between open and proprietary LLM-based web agents, suggesting a path towards more accessible and powerful autonomous web interaction systems.
Key takeaways:
- The paper introduces WebRL, a self-evolving online curriculum reinforcement learning framework for training high-performance web agents using open Large Language Models (LLMs).
- WebRL addresses key challenges in building LLM web agents, such as scarcity of training tasks, sparse feedback signals, and policy distribution drift in online learning.
- WebRL was applied to transform open Llama-3.1 and GLM-4 models into proficient web agents, significantly improving their success rates and outperforming previous state-of-the-art web agents trained on open LLMs.
- The findings demonstrate WebRL's effectiveness in bridging the gap between open and proprietary LLM-based web agents, suggesting potential for more accessible and powerful autonomous web interaction systems.