The system leverages the performance of gaming GPUs, which are similar to data center GPUs but cost significantly less, to train large language models. The challenge was to overcome the memory limitations of gaming GPUs. The solution involved using QLoRA to reduce the size of a model by around 400% and then using FSDP to shard that across two or more 24GB consumer cards. The team also used gradient checkpointing, CPU offloading, and Flash Attention 2 to optimize memory usage. The project marks a significant step towards making AI more accessible to everyone.
Key takeaways:
- Answer.AI has released its first project, an open-source system that can train a 70 billion parameter language model on a regular desktop computer with standard gaming GPUs.
- The system combines FSDP and QLoRA, and is the result of a collaboration between Answer.AI, Tim Dettmers (U Washington), and Hugging Face’s Titus von Koeller and Sourab Mangrulkar.
- The project aims to make AI more accessible by enabling everyone to create their own personalized models, rather than just using other people's models.
- The team faced several challenges in combining FSDP and QLoRA, but were eventually able to fine tune a 70 billion parameter model on dual 3090 gaming GPUs for the first time.