The company has also shared recent learnings and explorations in building long-context models with high quality and efficiency. They have extended LLaMA-2-7B to 32K long context, using Meta’s recipe of interpolation and continued pre-training. They have also updated both the inference and training stack to allow for efficient inference and fine-tuning with 32K context, using the recently released FlashAttention-2 and a range of other optimizations. This allows one to create their own 32K context model and conduct inference efficiently.
Key takeaways:
- Together AI has released LLaMA-2-7B-32K, a 32K context model built using Position Interpolation and system optimizations, including FlashAttention-2. The model can be fine-tuned for long-context tasks like multi-document understanding, summarization, and QA.
- The company has shared its data recipe for building long-context models and provided examples of how to fine-tune LLaMA-2-7B-32K for specific applications, such as book summarization and long-context question answering.
- They have updated both the inference and training stack to allow for efficient inference and fine-tuning with 32K context. This allows users to create their own 32K context model and conduct inference efficiently.
- Together AI believes that the future opportunity for open-source models is to extend the context length of open models to the regime of 32K-128K, matching that of state-of-the-art closed-source models.