GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks

The article discusses the deployment of Large Language Models (LLMs) in streaming applications, such as multi-round dialogue, which is a challenging task due to memory consumption and the inability of LLMs to generalize to longer texts than the training sequence length. The authors introduce StreamingLLM, an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence length without any fine-tuning. They demonstrate that StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more.

The authors also observe an interesting phenomenon called attention sink, where keeping the Key and Value states (KV) of initial tokens can largely recover the performance of window attention. They discover that adding a placeholder token as a dedicated attention sink during pre-training can further improve streaming deployment. In streaming settings, StreamingLLM outperforms the sliding window recomputation baseline by up to 22.2x speedup. The authors plan to release the core code of StreamingLLM, perplexity evn code, a Streaming Llama Chatbot demo, and the StreamEval dataset and evaluation code.

Key takeaways:

The paper introduces StreamingLLM, an efficient framework that enables Large Language Models (LLMs) to generalize to infinite sequence length without any fine-tuning.
StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more.
The authors observed an interesting phenomenon, namely attention sink, that keeping the KV of initial tokens will largely recover the performance of window attention.
In streaming settings, StreamingLLM outperforms the sliding window recomputation baseline by up to 22.2x speedup.

GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks

Key takeaways:

Comments (0)

Newsletter