WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

The paper introduces WebRL, a self-evolving online curriculum reinforcement learning framework, designed to train high-performance web agents using open Large Language Models (LLMs). WebRL addresses three main challenges in building LLM web agents: scarcity of training tasks, sparse feedback signals, and policy distribution drift in online learning. The framework incorporates a self-evolving curriculum, a robust outcome-supervised reward model (ORM), and adaptive reinforcement learning strategies to ensure consistent improvements.

WebRL has been applied to transform open Llama-3.1 and GLM-4 models into proficient web agents, significantly improving their success rates on WebArena-Lite. The open models trained with WebRL outperformed GPT-4-Turbo and GPT-4o, as well as previous state-of-the-art web agents trained on open LLMs. The study demonstrates WebRL's effectiveness in bridging the gap between open and proprietary LLM-based web agents, suggesting a path towards more accessible and powerful autonomous web interaction systems.

Key takeaways:

The paper introduces WebRL, a self-evolving online curriculum reinforcement learning framework for training high-performance web agents using open Large Language Models (LLMs).
WebRL addresses key challenges in building LLM web agents, such as scarcity of training tasks, sparse feedback signals, and policy distribution drift in online learning.
WebRL was applied to transform open Llama-3.1 and GLM-4 models into proficient web agents, significantly improving their success rates and outperforming previous state-of-the-art web agents trained on open LLMs.
The findings demonstrate WebRL's effectiveness in bridging the gap between open and proprietary LLM-based web agents, suggesting potential for more accessible and powerful autonomous web interaction systems.

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Key takeaways:

Comments (0)

Newsletter