Grok-1 is a Mixture-of-Experts model with 25% of the weights active on a given token. It was trained on a large amount of text data using a custom training stack on top of JAX and Rust. For those interested in using the model, instructions are available on the xAI's GitHub page.
Key takeaways:
- The weights and architecture of the 314 billion parameter Mixture-of-Experts model, Grok-1, are being released.
- Grok-1 is a large language model trained from scratch by xAI and is not fine-tuned for any specific application.
- The model's weights and architecture are being released under the Apache 2.0 license.
- The model was trained using a custom training stack on top of JAX and Rust in October 2023.