New LLM developed for under $50 outperforms OpenAI’s o1-preview

Researchers from Stanford and the University of Washington have developed a new large language model (LLM) named s1-32B, which outperforms OpenAI's o1-preview in certain tasks at a significantly lower cost. The model, detailed in a recent paper, utilizes a technology called test-time scaling, which enhances LLMs' output quality by adjusting the time and resources used for generating responses. The s1-32B model was created by customizing Alibaba's Qwen2.5-32B-Instruct with a dataset of 1,000 challenging prompts and AI-generated answers, incorporating a method called budget forcing. This method instructs the LLM to adjust its reasoning time, either extending or shortening it, to improve accuracy and efficiency.

The researchers tested s1-32B against OpenAI's o1-preview using the MATH and AIME24 benchmarks, where it achieved scores up to 27% higher. The model also demonstrated improved performance by using test-time compute to increase its score from 50% to 57% on math questions. The development of s1-32B was notably cost-effective, requiring only $20 worth of hardware and 26 minutes of training on 16 Nvidia H100 graphics cards.

Key takeaways:

Researchers from Stanford and the University of Washington developed a large language model named s1-32B that outperforms OpenAI's o1-preview at a fraction of the cost.
s1-32B was created by customizing Alibaba's Qwen2.5-32B-Instruct using a dataset of 1,000 prompts and AI-generated answers, incorporating a method called budget forcing.
Budget forcing helps s1-32B manage its reasoning time effectively, addressing issues of spending too little or too much time on tasks, leading to improved accuracy.
s1-32B achieved scores up to 27% higher than o1-preview on math benchmarks and was developed using $20 worth of hardware in 26 minutes with 16 Nvidia H100 graphics cards.

New LLM developed for under $50 outperforms OpenAI’s o1-preview - SiliconANGLE

Key takeaways:

Comments (0)

Newsletter