1
Feature Story
New LLM developed for under $50 outperforms OpenAI’s o1-preview - SiliconANGLE
Feb 06, 2025 · siliconangle.com
The researchers tested s1-32B against OpenAI's o1-preview using the MATH and AIME24 benchmarks, where it achieved scores up to 27% higher. The model also demonstrated improved performance by using test-time compute to increase its score from 50% to 57% on math questions. The development of s1-32B was notably cost-effective, requiring only $20 worth of hardware and 26 minutes of training on 16 Nvidia H100 graphics cards.
Key takeaways
- Researchers from Stanford and the University of Washington developed a large language model named s1-32B that outperforms OpenAI's o1-preview at a fraction of the cost.
- s1-32B was created by customizing Alibaba's Qwen2.5-32B-Instruct using a dataset of 1,000 prompts and AI-generated answers, incorporating a method called budget forcing.
- Budget forcing helps s1-32B manage its reasoning time effectively, addressing issues of spending too little or too much time on tasks, leading to improved accuracy.
- s1-32B achieved scores up to 27% higher than o1-preview on math benchmarks and was developed using $20 worth of hardware in 26 minutes with 16 Nvidia H100 graphics cards.