Claude 3 beats GPT-4 on Aider’s code editing benchmark

Anthropic has released new Claude 3 models, which have shown improved performance on coding tasks in benchmarks using Aider's code editing suite. The Claude 3 Opus model outperformed all of OpenAI's models, making it the best available model for pair programming with AI. However, while the Opus model scored the highest, it was only slightly better than the GPT-4 Turbo results, and considering its extra costs and slower response times, it's unclear which model is the most practical for daily use.

The Claude 3 Sonnet model performed similarly to OpenAI’s GPT-3.5 Turbo models. It's worth noting that both Claude 3 Opus and Sonnet are slower and more expensive than OpenAI’s models, but Claude 3 has a 2X larger context window than the latest GPT-4 Turbo, which could be advantageous for larger code bases. However, the Claude models refused to perform certain coding tasks and returned errors, indicating possible instability in the Claude APIs.

Key takeaways:

The new Claude 3 Opus model by Anthropic outperforms all of OpenAI’s models in coding tasks, making it the best available model for pair programming with AI.
The Claude 3 Opus model got the highest score ever on Aider's code editing benchmark, completing 68.4% of the tasks with two tries.
Despite its superior performance, Claude 3 Opus is slower and more expensive than OpenAI’s models, which offer almost the same coding skill faster and cheaper.
The Claude models have shown some instability, refusing to perform certain coding tasks and returning HTTP 5xx errors, indicating that Anthropic may be struggling with high demand.

Claude 3 beats GPT-4 on Aider’s code editing benchmark

Key takeaways:

Comments (0)

Newsletter