DeepSeek trained this model by fine-tuning Qwen3-8B with text generated by the updated R1. The model is available on the AI dev platform Hugging Face and is intended for both academic research and industrial development of small-scale models. It is released under a permissive MIT license, allowing unrestricted commercial use. Several platforms, including LM Studio, offer the model through an API.
Key takeaways:
- DeepSeek released a smaller, distilled version of its R1 model, called DeepSeek-R1-0528-Qwen3-8B, which outperforms comparably-sized models on certain benchmarks.
- The distilled model performs better than Google's Gemini 2.5 Flash on the AIME 2025 math test and nearly matches Microsoft's Phi 4 reasoning plus model on the HMMT test.
- DeepSeek-R1-0528-Qwen3-8B is less computationally demanding than the full-sized R1 model, requiring a single GPU with 40GB-80GB of RAM.
- The model is available under a permissive MIT license and can be accessed through various hosts, including LM Studio, via an API.