The article also discusses the model's performance in various evaluations, highlighting its strong performance in multi-lingual tasks and common sense reasoning tests. However, it also acknowledges areas where the model performed poorly, such as arithmetic evaluations due to a missed dataset during training. The article concludes with future goals for the model training, including the completion of the 2T Eagle 7B models and the training of the v6 "Finch" line of models.
Key takeaways:
- The EagleX 1.7T model, a major checkpoint that surpasses LLaMA2 7B, has been released for research purposes. It is part of a larger 2T model training and is built on the RWKV-v5 architecture.
- The model is trained on 1.7 Trillion tokens across 100+ languages and outperforms all 7B class models in multi-lingual benchmarks. It also surpasses LLaMA2 in multiple English evaluations.
- The EagleX 1.7T model is licensed under Apache 2.0, which means it can be used personally or commercially without restrictions. It can be downloaded from HuggingFace and used on their new cloud platform.
- The model has shown strong performance in English and multi-lingual evaluations, but has shown weaknesses in areas such as arithmetic due to missing datasets during training. Future improvements are expected as training continues.