Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

ai21labs/Jamba-v0.1 · Hugging Face

Mar 28, 2024 - huggingface.co
The article introduces Jamba, a state-of-the-art, hybrid SSM-Transformer LLM developed by AI21. Jamba is a pretrained, mixture-of-experts (MoE) generative text model with 12B active parameters and a total of 52B parameters across all experts. It supports a 256K context length and can fit up to 140K tokens on a single 80GB GPU. The model is the first production-scale Mamba implementation, offering promising research and application opportunities.

Jamba requires the use of `transformers` version 4.39.0 or higher and needs to be run on a CUDA device. It can be loaded in half precision or 8-bit precision, and can be fine-tuned for custom solutions. The model has shown impressive results on common benchmarks such as HellaSwag, Arc Challenge, and WinoGrande. However, it is a base model and does not have safety moderation mechanisms, so guardrails should be added for responsible and safe use.

Key takeaways:

  • Jamba is a state-of-the-art, hybrid SSM-Transformer LLM developed by AI21, with 12B active parameters and a total of 52B parameters across all experts.
  • The model supports a 256K context length and can fit up to 140K tokens on a single 80GB GPU.
  • Jamba is a pretrained base model intended for use as a foundation layer for fine tuning, training, and developing custom solutions. It does not have safety moderation mechanisms.
  • AI21, the developer of Jamba, builds reliable, practical, and scalable AI solutions for the enterprise.
View Full Article

Comments (0)

Be the first to comment!