Meta Open-Sources MEGALODON LLM for Efficient Long Sequence Modeling

Researchers from Meta, University of Southern California, Carnegie Mellon University, and University of California San Diego have open-sourced MEGALODON, a large language model (LLM) with unlimited context length. MEGALODON outperforms the similarly-sized Llama 2 model on various benchmarks and addresses several shortcomings of the Transformer neural architecture. It uses a chunk-wise attention instead of the standard multihead attention and introduces sequence-based parallelism during training, improving scalability for long-context training.

MEGALODON builds on the team's previous model, MEGA, with new features such as a complex exponential moving average (CEMA). The team trained a seven-billion parameter model, MEGALODON-7B, using the same dataset and training hyperparameters as Llama2-7B, and found it to be more computationally efficient. MEGALODON outperformed all baseline models on the NarrativeQA subtask and achieved competitive results with Llama 2 on all tasks. The MEGALODON code is available on GitHub.

Key takeaways:

Researchers from Meta, University of Southern California, Carnegie Mellon University, and University of California San Diego have open-sourced MEGALODON, a large language model (LLM) with unlimited context length and linear computational complexity.
MEGALODON outperforms a similarly-sized Llama 2 model on a range of benchmarks and is designed to address several shortcomings of the Transformer neural architecture underlying most LLMs.
MEGALODON uses a chunk-wise attention and sequence-based parallelism during training, improving scalability for long-context training. It also builds on the research team's previous model, MEGA, with several new features including a complex exponential moving average (CEMA).
The MEGALODON code is available on GitHub and the researchers believe its robust improvements lead to a potential direction of future work to apply MEGALODON for large-scale multi-modality pretraining.

Meta Open-Sources MEGALODON LLM for Efficient Long Sequence Modeling

Key takeaways:

Comments (0)

Newsletter