Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

“The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time

Mar 27, 2024 - arstechnica.com
Anthropic's Claude 3 Opus language model has surpassed OpenAI's GPT-4 on the Chatbot Arena leaderboard, marking the first time GPT-4 has been unseated since its inclusion in May 2023. The leaderboard, run by Large Model Systems Organization, is a popular tool used by AI researchers to compare the capabilities of AI language models. The rise of Claude 3 and another of Anthropic's models, Haiku, marks a shift in the AI landscape, with the best models no longer exclusively coming from OpenAI.

The Chatbot Arena leaderboard is updated based on user ratings of chatbot outputs, providing a measure of performance in a field where quantifying output quality can be challenging. The rise of Claude 3 has led to some users replacing ChatGPT in their daily workflows, potentially impacting ChatGPT's market share. However, OpenAI is expected to release a new successor to GPT-4 Turbo later this year, indicating that competition in the large language model space will continue.

Key takeaways:

  • Anthropic's Claude 3 Opus large language model (LLM) has surpassed OpenAI's GPT-4 for the first time on Chatbot Arena, a popular leaderboard used by AI researchers to gauge the capabilities of AI language models.
  • Since its inclusion in May 2023, GPT-4 has consistently topped the leaderboard, making its defeat a significant moment in the history of AI language models.
  • Chatbot Arena is run by Large Model Systems Organization (LMSYS ORG), a research organization that operates as a collaboration between students and faculty at University of California, Berkeley, UC San Diego, and Carnegie Mellon University.
  • OpenAI is expected to release a major new successor to GPT-4 Turbo, possibly named GPT-4.5 or GPT-5, sometime this year, indicating that the LLM space will continue to be competitive.
View Full Article

Comments (0)

Be the first to comment!