Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Yi: Open Foundation Models by 01.AI

Mar 10, 2024 - news.bensbites.co
The article introduces the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. These models are based on 6B and 34B pretrained language models and are extended to chat models, 200K long context models, depth-upscaled models, and vision-language models. They perform well on benchmarks like MMLU and deliver strong human preference rates on evaluation platforms like AlpacaEval and Chatbot Arena. The performance of these models is attributed to the high-quality data obtained through data-engineering efforts, including the construction of 3.1 trillion tokens of English and Chinese corpora.

For finetuning, a small scale instruction dataset is polished over multiple iterations, with each instance verified by machine learning engineers. The chat language model is combined with a vision transformer encoder for vision-language, training the model to align visual representations to the language model's semantic space. The context length is extended to 200K through lightweight continual pretraining, improving needle-in-a-haystack retrieval performance. The depth of the pretrained checkpoint is also extended through continual pretraining for further performance enhancement. The authors believe that scaling up model parameters using optimized data will lead to even stronger frontier models.

Key takeaways:

  • The Yi model family is a series of language and multimodal models that demonstrate strong multi-dimensional capabilities, based on 6B and 34B pretrained language models.
  • The models are extended to chat models, 200K long context models, depth-upscaled models, and vision-language models, showing strong performance on various benchmarks.
  • The performance of Yi models is attributed to data quality resulting from extensive data-engineering efforts, including the construction of 3.1 trillion tokens of English and Chinese corpora.
  • The authors believe that scaling up model parameters using thoroughly optimized data will lead to even stronger frontier models in the future.
View Full Article

Comments (0)

Be the first to comment!