Jamba uses a blocks-and-layers approach, with each block containing either an attention or a Mamba layer, followed by a multi-layer perceptron (MLP). This results in one Transformer layer for every eight total layers, maximizing quality and throughput on a single GPU. The model has shown impressive results on various benchmarks, matching or outperforming state-of-the-art models in its size class. Jamba is released with open weights under the Apache 2.0 license and is available on Hugging Face and the NVIDIA API catalog. AI21 Labs plans to release a fine-tuned, safer version for commercial use in the coming weeks.
Key takeaways:
- AI21 Labs has released Jamba, the world's first production-grade AI model based on the Mamba architecture, combining the strengths of the Mamba Structured State Space model (SSM) and the traditional Transformer architecture.
- Jamba has an extensive context window of 256K tokens, fitting up to 140K tokens on a single 80GB GPU, and can handle significantly longer contexts than most of its counterparts.
- Jamba delivers 3x throughput on long contexts compared to Transformer-based models of similar size, due to its unique hybrid architecture composed of Transformer, Mamba, and mixture-of-experts (MoE) layers.
- Jamba is being released with open weights under Apache 2.0 license, available on Hugging Face and NVIDIA API catalog, with plans for a fine-tuned, safer version for commercial use in the coming weeks.