Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Jamba: A Hybrid Transformer-Mamba Language Model

Apr 01, 2024 - news.bensbites.co
The article introduces Jamba, a new base large language model based on a hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. The model interleaves blocks of Transformer and Mamba layers, utilizing the benefits of both model families and incorporating MoE in some layers to increase capacity while managing active parameter usage. The flexibility of this architecture allows for resource- and objective-specific configurations, resulting in a powerful model that fits in a single 80GB GPU. Jamba offers high throughput and small memory footprint compared to standard Transformers, and delivers state-of-the-art performance on standard language model benchmarks and long-context evaluations.

The authors also explore various architectural decisions, such as the combination of Transformer and Mamba layers, and the mixing of experts, demonstrating their importance in large scale modeling. They highlight several interesting properties of these architectures revealed through the training and evaluation of Jamba. The authors plan to release checkpoints from various ablation runs to encourage further exploration of this novel architecture, and have made the weights of their implementation of Jamba publicly available under a permissive license.

Key takeaways:

  • The authors introduce Jamba, a new base large language model based on a hybrid Transformer-Mamba mixture-of-experts (MoE) architecture.
  • Jamba interleaves blocks of Transformer and Mamba layers, increasing model capacity while maintaining manageable active parameter usage.
  • The model provides high throughput and small memory footprint compared to vanilla Transformers, and offers state-of-the-art performance on standard language model benchmarks and long-context evaluations.
  • The authors plan to release checkpoints from various ablation runs and make the weights of their implementation of Jamba publicly available under a permissive license.
View Full Article

Comments (0)

Be the first to comment!