The author further tested this hypothesis on SWE-bench instances, organizing AI agents as if they were in different corporations and evaluating six different organization structures. The results showed that competitive teams increase chances of success and structures with single-point failures didn’t do as well. The author concluded that while increasing the number of agents or changing the way they are organized can lead to marginal improvements in performance, bigger jumps in progress require a change in the actual logical reasoning capability of the agents or the strategy and methods they can employ to solve software issues.
Key takeaways:
- AI systems with multiple agents outperform a simple LLM call in almost any task, reducing hallucinations and improving accuracy. However, simply increasing the number of agents does not lead to dramatic improvements.
- Organizing AI agents as if they were in big tech companies like Apple, Microsoft, Google, Amazon, and Oracle, shows that companies with multiple competing teams outperform centralized hierarchies. Systems with single points of failure underperform.
- Competitive teams increase chances of success in problem-solving. Structures with single-point failures didn’t do as well. The top two performers, Microsoft and Apple, are also the two largest tech companies by market cap in the world.
- Improvements in AI agent performance may require a change in the actual logical reasoning capability of the agents, or the strategy and methods they can employ to solve software issues. This could involve more powerful base models or a wider array of tools given to the agents.