Despite not being the most powerful model, o3-mini shows competitive performance against rivals like DeepSeek's R1, especially with high reasoning effort. It excels in certain benchmarks, such as AIME 2024 and SWE-bench Verified, but lags behind in others like GPQA Diamond. OpenAI emphasizes the model's cost-effectiveness and safety, claiming it surpasses previous models in safety evaluations. The release of o3-mini is part of OpenAI's broader mission to advance cost-effective intelligence while addressing challenges in the AI landscape.
Key takeaways:
- OpenAI launched o3-mini, a new AI reasoning model, which is positioned as both powerful and affordable, and is aimed at broadening accessibility to advanced AI.
- O3-mini is fine-tuned for STEM problems and is claimed to be more reliable than previous models, with external testers preferring its answers over those from o1-mini more than half the time.
- The model is available via ChatGPT and OpenAI's API, with pricing set at $1.10 per million cached input tokens and $4.40 per million output tokens, making it 63% cheaper than o1-mini.
- O3-mini is not the most powerful model and does not surpass DeepSeek's R1 in every benchmark, but it offers competitive performance at a lower cost and latency, especially with high reasoning effort.