tencent/Tencent-Hunyuan-Large

The article introduces the Hunyuan-Large (Hunyuan-MoE-A52B) model, the largest open-source Transformer-based Mixture of Experts (MoE) model in the industry, with 389 billion parameters and 52 billion active parameters. Developed in response to the challenge of optimizing resource consumption while maintaining high performance in large language models (LLMs), the Hunyuan-Large model features high-quality synthetic data, KV cache compression, expert-specific learning rate scaling, long-context processing capability, and extensive benchmarking. The model is designed to inspire researchers with innovative ideas and advance the progress and application of AI technology.

In benchmark evaluations, the Hunyuan-Large pre-trained model outperformed both Dense and MoE based competitors in overall performance. It showed superior performance in commonsense understanding and reasoning, and classical NLP tasks such as QA and reading comprehension tasks. The model also excelled in mathematics capability, outperforming all baselines in math datasets. The Hunyuan-Large-Instruct model also demonstrated significant improvements on most types of tasks compared to LLMs with similar activated parameters, indicating the effectiveness of post-training.

Key takeaways:

The Hunyuan-Large (Hunyuan-MoE-A52B) model is the largest open-source Transformer-based Mixture of Experts (MoE) model in the industry, featuring a total of 389 billion parameters and 52 billion active parameters.
The model has been optimized to handle long-context inputs, reduce memory usage and computational overhead, and ensure each sub-model effectively learns from the data and contributes to overall performance.
Hunyuan-Large outperforms all baselines in math datasets of GSM8K and MATH, and also gains the best results on CMATH in Chinese. It also achieves the overall best performance in all Chinese tasks.
The Hunyuan-Large-Instruct model demonstrates superior understanding and reasoning capabilities across a wide array of language understanding tasks and achieves the best performance on the MMLU and MATH dataset.

tencent/Tencent-Hunyuan-Large · Hugging Face

Key takeaways:

Comments (0)

Newsletter