Preparing for the era of 32K context: Early learnings and explorations

Together AI has released LLaMA-2-7B-32K, a 32K context model built using Position Interpolation and system optimizations, including FlashAttention-2. The model can be fine-tuned for targeted, long-context tasks such as multi-document understanding, summarization, and QA. The model's release follows the rapid progress of the open-source ecosystem for LLMs, with open-source models catching up with closed-source models. The company believes the next opportunity for open-source models is to extend the context length of open models to the regime of 32K-128K, matching that of state-of-the-art closed-source models.

The company has also shared recent learnings and explorations in building long-context models with high quality and efficiency. They have extended LLaMA-2-7B to 32K long context, using Meta’s recipe of interpolation and continued pre-training. They have also updated both the inference and training stack to allow for efficient inference and fine-tuning with 32K context, using the recently released FlashAttention-2 and a range of other optimizations. This allows one to create their own 32K context model and conduct inference efficiently.

Key takeaways:

Together AI has released LLaMA-2-7B-32K, a 32K context model built using Position Interpolation and system optimizations, including FlashAttention-2. The model can be fine-tuned for long-context tasks like multi-document understanding, summarization, and QA.
The company has shared its data recipe for building long-context models and provided examples of how to fine-tune LLaMA-2-7B-32K for specific applications, such as book summarization and long-context question answering.
They have updated both the inference and training stack to allow for efficient inference and fine-tuning with 32K context. This allows users to create their own 32K context model and conduct inference efficiently.
Together AI believes that the future opportunity for open-source models is to extend the context length of open models to the regime of 32K-128K, matching that of state-of-the-art closed-source models.

Preparing for the era of 32K context: Early learnings and explorations — TOGETHER

Key takeaways:

Comments (0)

Newsletter