LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

The paper introduces LongRoPE, a method that extends the context window of pre-trained large language models (LLMs) to 2048k tokens, a significant increase from the current limit of around 128k tokens. This is achieved with up to only 1k fine-tuning steps at within 256k training lengths, while maintaining performance at the original short context window. The method involves three key innovations: exploiting non-uniformities in positional interpolation, introducing a progressive extension strategy, and readjusting LongRoPE on 8k length to recover the short context window performance.

The effectiveness of LongRoPE is demonstrated through extensive experiments on LLaMA2 and Mistral across various tasks. The models extended via LongRoPE retain the original architecture with minor modifications to the positional embedding, and can reuse most pre-existing optimizations. This method addresses the challenges of high fine-tuning costs, scarcity of long texts, and catastrophic values introduced by new token positions in LLMs.

Key takeaways:

The paper introduces LongRoPE, a method that extends the context window of pre-trained large language models (LLMs) to 2048k tokens, a significant increase from the current limit of around 128k tokens.
LongRoPE achieves this extension with only up to 1k fine-tuning steps at within 256k training lengths, while maintaining performance at the original short context window.
Three key innovations are introduced: non-uniformities in positional interpolation, a progressive extension strategy, and readjustment of LongRoPE on 8k length to recover the short context window performance.
The models extended via LongRoPE retain the original architecture with minor modifications to the positional embedding, and can reuse most pre-existing optimizations.

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Key takeaways:

Comments (0)

Newsletter