The authors introduce a new system called HPTSA, which consists of a team of LLM agents, including a planning agent that can launch subagents. This planning agent explores the system and determines which subagents to call, effectively addressing long-term planning issues when trying different vulnerabilities. The authors tested this system on a benchmark of 15 real-world vulnerabilities and found that their team of agents improved over previous work by up to 4.5 times.
Key takeaways:
- LLM agents have shown potential in exploiting real-world vulnerabilities in the field of cybersecurity, but struggle with unknown, zero-day vulnerabilities.
- The study introduces a new system, HPTSA, which uses a team of LLM agents including a planning agent that can launch subagents.
- The planning agent explores the system and decides which subagents to call, resolving long-term planning issues when trying different vulnerabilities.
- The team of agents was tested on a benchmark of 15 real-world vulnerabilities, showing an improvement of up to 4.5 times over previous work.