The article also emphasizes the importance of quality, targeted data in training LLMs. It cites Microsoft's creation of a targeted dataset for a model optimized to write Python functions from docstrings, which performed as well as models with ten times the number of parameters. The article concludes by suggesting that organizations operating in specialized domains may need to train or fine-tune LLMs with specialized data to understand that domain. It also suggests that smaller, targeted LLMs are not only more cost-effective but also more efficient for inference and fine-tuning.
Key takeaways:
- There is active research into creating smaller language models that perform better than larger models on specific benchmarks, potentially reducing the intimidating costs associated with training large language models.
- Researchers have developed a data set of toddler-level stories called TinyStories that can be used to create models of less than ten million parameters that still produce comprehensible outputs.
- Microsoft researchers have found success in creating targeted datasets for models that perform well on specific tasks, such as writing Python functions from docstrings, demonstrating the value of quality, targeted data.
- Smaller, targeted language models not only provide more bang for their buck from training costs, but they are also cheaper to run inference and fine-tuning on, making them a potentially more cost-effective choice for many applications.