The authors also outlined their plans for future research, including testing on other datasets and databases, adding more training data, and experimenting with more foundational models. They are developing a Python package that can generate SQL for specific databases, generate Plotly code for charts, and generate follow-up questions. The package, called Vanna, can be trained using schema, documentation, and SQL examples.
Key takeaways:
- The study shows that context is crucial in achieving accuracy in AI-generated SQL, with the right context improving accuracy from around 3% to 80%.
- The research compared different Language Learning Models (LLMs) including Google Bison, GPT 3.5, GPT 4, and Llama 2, with GPT 4 emerging as the best overall LLM for generating SQL.
- The study also demonstrated how to use the methods to generate SQL for your database, with a Python package in development to generate SQL for specific databases.
- Future steps to improve accuracy include using other datasets, adding more training data, trying more databases, and experimenting with more foundational models.