DBRX was trained using a transformer-based decoder-only LLM that was trained using next-token prediction. It has 132B total parameters, of which 36B parameters are active on any input. It was pre-trained on 12T tokens of text and code data. DBRX is more efficient to train and use, with inference being up to 2x faster than LLaMA2-70B. It is also being integrated into Databricks' GenAI-powered products and is already surpassing GPT-3.5 Turbo in applications like SQL.
Key takeaways:
- Databricks has introduced DBRX, an open, general-purpose large language model (LLM) that sets a new state-of-the-art for established open LLMs and surpasses GPT-3.5.
- DBRX uses a fine-grained mixture-of-experts (MoE) architecture, making it more efficient and faster in inference than other models like LLaMA2-70B.
- The model was trained on 12T tokens of text and code data, and it has been integrated into Databricks' GenAI-powered products, surpassing GPT-3.5 Turbo in applications like SQL.
- Databricks customers can now use DBRX via APIs, and they can also pretrain their own DBRX-class models from scratch or continue training on top of one of Databricks' checkpoints.