The Hierarchical Memory Transformer was evaluated on general language modeling and question-answering tasks, showing consistent improvement in the long-context processing ability of context-constrained and long-context models. The authors suggest that with an additional 0.5% - 2% of parameters, HMT can be easily incorporated into future LLMs to effectively handle long context. The code for this model has been open-sourced on Github.
Key takeaways:
- The paper proposes a new framework called Hierarchical Memory Transformer (HMT) that improves models' long-context processing ability by imitating human memorization behavior.
- HMT uses memory-augmented segment-level recurrence and organizes the memory hierarchy by preserving tokens from early input token segments, passing memory embeddings along the sequence, and recalling relevant information from history.
- When evaluated on general language modeling and question-answering tasks, HMT showed improvement in the long-context processing ability of context-constrained and long-context models.
- With an additional 0.5% - 2% of parameters, HMT can be easily integrated into future large language models to handle long context effectively.