DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

Chinese AI startup DeepSeek has released DeepSeek-V3, an ultra-large open-source model with 671 billion parameters, using a mixture-of-experts architecture to efficiently activate select parameters for tasks. Available on Hugging Face, the model outperforms leading open-source models like Meta’s Llama 3.1-405B and rivals closed models from Anthropic and OpenAI. Innovations include an auxiliary loss-free load-balancing strategy and multi-token prediction, enhancing training efficiency and speed. Trained economically at $5.57 million, DeepSeek-V3 is the strongest open-source model, excelling in Chinese and math benchmarks, though Anthropic’s Claude 3.5 Sonnet surpasses it in some areas.

DeepSeek-V3’s development highlights the narrowing gap between open and closed-source AI, offering enterprises diverse options for AI integration. The model’s code is available on GitHub under an MIT license, with the model itself provided under DeepSeek’s license. Enterprises can test it via DeepSeek Chat and access the API for commercial use, with pricing set to change after February 8. This advancement underscores the potential for open-source models to compete with closed-source counterparts, promoting a more competitive and diverse AI landscape.

Key takeaways:

DeepSeek has released a new ultra-large AI model, DeepSeek-V3, with 671B parameters, using a mixture-of-experts architecture for efficient task handling.
DeepSeek-V3 outperforms leading open-source models and closely matches the performance of closed models, marking a significant development in closing the gap between open and closed-source AI.
The model introduces innovations like an auxiliary loss-free load-balancing strategy and multi-token prediction (MTP), enhancing training efficiency and speed.
DeepSeek-V3 is available under the company's model license on Hugging Face, with the code accessible on GitHub under an MIT license, and enterprises can test it via DeepSeek Chat.

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

Key takeaways:

Comments (0)

Newsletter