The Claude 3 Sonnet model performed similarly to OpenAI’s GPT-3.5 Turbo models. It's worth noting that both Claude 3 Opus and Sonnet are slower and more expensive than OpenAI’s models, but Claude 3 has a 2X larger context window than the latest GPT-4 Turbo, which could be advantageous for larger code bases. However, the Claude models refused to perform certain coding tasks and returned errors, indicating possible instability in the Claude APIs.
Key takeaways:
- The new Claude 3 Opus model by Anthropic outperforms all of OpenAI’s models in coding tasks, making it the best available model for pair programming with AI.
- The Claude 3 Opus model got the highest score ever on Aider's code editing benchmark, completing 68.4% of the tasks with two tries.
- Despite its superior performance, Claude 3 Opus is slower and more expensive than OpenAI’s models, which offer almost the same coding skill faster and cheaper.
- The Claude models have shown some instability, refusing to perform certain coding tasks and returning HTTP 5xx errors, indicating that Anthropic may be struggling with high demand.