The author concludes that while fine-tuning GPT 3.5 is useful for initial validation or MVP work, Llama 2 is a better choice for those looking to save money, maximize performance, have flexibility in training and deployment infrastructure, and maintain privacy of data. The author encourages the use of open-source models like Llama 2 for fine-tuning tasks, given their cost-effectiveness and comparable performance to GPT 3.5.
Key takeaways:
- GPT 3.5 performs slightly better than CodeLlama 34B on SQL and functional representation tasks, but costs 4-6x more to train and deploy.
- The author suggests that fine-tuning GPT 3.5 is suitable for initial validation or MVP work, while an open-source model like Llama 2 is more cost-effective for long-term use.
- The author used a subset of the Spider dataset and the Viggo functional representation dataset for fine-tuning, with minimal hyperparameter tuning for Llama and allowing OpenAI to choose the number of epochs.
- The author suggests that fine-tuning an open-source model like Llama 2 is beneficial for saving money, maximizing performance, having flexibility in train and deploy infrastructure, and maintaining privacy of data.