Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

HMT: Hierarchical Memory Transformer for Long Context Language Processing

May 17, 2024 - news.bensbites.com
The article discusses the limitations of transformer-based large language models (LLM) in language processing applications, particularly their restriction on the context window that allows the model to attend to every token in the inputs. To address this, the authors propose a novel framework called the Hierarchical Memory Transformer (HMT), which enhances models' long-context processing ability by imitating human memorization behavior. This is achieved through memory-augmented segment-level recurrence, organizing the memory hierarchy by preserving tokens from early input token segments, passing memory embeddings along the sequence, and recalling relevant information from history.

The Hierarchical Memory Transformer was evaluated on general language modeling and question-answering tasks, showing consistent improvement in the long-context processing ability of context-constrained and long-context models. The authors suggest that with an additional 0.5% - 2% of parameters, HMT can be easily incorporated into future LLMs to effectively handle long context. The code for this model has been open-sourced on Github.

Key takeaways:

  • The paper proposes a new framework called Hierarchical Memory Transformer (HMT) that improves models' long-context processing ability by imitating human memorization behavior.
  • HMT uses memory-augmented segment-level recurrence and organizes the memory hierarchy by preserving tokens from early input token segments, passing memory embeddings along the sequence, and recalling relevant information from history.
  • When evaluated on general language modeling and question-answering tasks, HMT showed improvement in the long-context processing ability of context-constrained and long-context models.
  • With an additional 0.5% - 2% of parameters, HMT can be easily integrated into future large language models to handle long context effectively.
View Full Article

Comments (0)

Be the first to comment!