1
Feature Story
Ai2 says its new AI model beats one of DeepSeek's best | TechCrunch
Jan 30, 2025 · techcrunch.com
Tulu3-405B excelled in benchmarks such as PopQA, outperforming DeepSeek V3, GPT-4o, and Meta’s Llama 3.1 405B model, and achieved the highest performance in its class on GSM8K, a test of grade school-level math word problems. The model is accessible for testing via Ai2’s chatbot web app, and the training code is available on GitHub and Hugging Face. Ai2's spokesperson suggests that this model marks a pivotal moment in AI development, showcasing the U.S.'s ability to lead with competitive, open-source AI independent of major tech companies.
Key takeaways
- Ai2 released Tulu3-405B, an open-source AI model that outperforms DeepSeek V3 and GPT-4o on certain benchmarks.
- Tulu3-405B contains 405 billion parameters and required 256 GPUs to train, showcasing its complexity and power.
- The model uses reinforcement learning with verifiable rewards (RLVR) to achieve competitive performance on tasks with verifiable outcomes.
- Tulu3-405B is available for testing via Ai2’s chatbot web app, and its code is accessible on GitHub and Hugging Face.