DeepSeek's distilled new R1 AI model can run on a single GPU

DeepSeek has released a smaller, distilled version of its updated R1 reasoning AI model, named DeepSeek-R1-0528-Qwen3-8B. This model, built on Alibaba's Qwen3-8B, outperforms similar-sized models like Google's Gemini 2.5 Flash on the AIME 2025 math benchmark and nearly matches Microsoft's Phi 4 on the HMMT math skills test. While distilled models are generally less capable than their full-sized counterparts, they require significantly less computational power. DeepSeek-R1-0528-Qwen3-8B can run on a single GPU with 40GB-80GB of RAM, unlike the full-sized R1, which needs around a dozen 80GB GPUs.

DeepSeek trained this model by fine-tuning Qwen3-8B with text generated by the updated R1. The model is available on the AI dev platform Hugging Face and is intended for both academic research and industrial development of small-scale models. It is released under a permissive MIT license, allowing unrestricted commercial use. Several platforms, including LM Studio, offer the model through an API.

Key takeaways:

DeepSeek released a smaller, distilled version of its R1 model, called DeepSeek-R1-0528-Qwen3-8B, which outperforms comparably-sized models on certain benchmarks.
The distilled model performs better than Google's Gemini 2.5 Flash on the AIME 2025 math test and nearly matches Microsoft's Phi 4 reasoning plus model on the HMMT test.
DeepSeek-R1-0528-Qwen3-8B is less computationally demanding than the full-sized R1 model, requiring a single GPU with 40GB-80GB of RAM.
The model is available under a permissive MIT license and can be accessed through various hosts, including LM Studio, via an API.

DeepSeek's distilled new R1 AI model can run on a single GPU | TechCrunch

Key takeaways:

Comments (0)

Newsletter