Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Training great LLMs entirely from ground zero in the wilderness as a startup — Yi Tay

Mar 06, 2024 - yitay.net
The author discusses the challenges and experiences of building infrastructure and training large language & multimodal models from scratch at Reka. The author highlights the instability and variance in quality of compute providers, referring to it as a "hardware lottery". They also discuss the difficulties of multi-cluster setups, the quality of external codebases compared to Google's, and the need for a less systematic and more instinctual approach in a startup environment due to limited resources.

The author concludes that despite the difficulties, including unreliable compute providers and compute scarcity, they were able to successfully train strong models with few trials and resources. They attribute this success to the strong prior knowledge and intuition built up in their ML careers. The author acknowledges that this is only part of the story of starting a company and promises to share more about data pipelines, human evaluation, and other topics in the future.

Key takeaways:

  • The author discusses the challenges faced while building infrastructure and training large language & multimodal models from scratch at Reka, highlighting the instability of compute providers and the variance in the quality of clusters and accelerators.
  • They express frustration with the unpredictable quality of hardware from different providers, calling it a "hardware lottery" due to the varying levels of difficulty and pain experienced in training models.
  • The author also discusses the difficulties of multi-cluster setups, the lower quality of external codebases compared to those at Google, and the need to abandon systematic scaling of models in favor of more instinctive, "Yolo" runs due to limited resources in a startup environment.
  • Despite these challenges, the author emphasizes that they were able to train strong models with few trials and resources, attributing this success to the strong prior knowledge and intuition built up in their ML careers.
View Full Article

Comments (0)

Be the first to comment!