How to Make LLMs Shut Up

Daksh Gupta, co-founder of Greptile, discusses the challenges faced in reducing the number of comments generated by their AI code review bot, which initially overwhelmed developers with too many nitpicky remarks. The team explored various strategies to improve comment quality, including prompt engineering and using LLMs to judge comment severity, but these approaches failed to effectively reduce the number of unnecessary comments. They discovered that nits are subjective and vary between teams, making it difficult to establish a universal standard for comment severity.

Ultimately, Greptile found success by implementing a clustering approach. They generated vector embeddings of past comments and stored them in a vector database, allowing them to filter out comments similar to those that had been downvoted by developers. This method significantly improved the address rate of comments from 19% to over 55%, demonstrating a substantial reduction in noise. The clustering technique allowed the bot to learn and adapt to team-specific standards, proving to be the most effective solution in managing the LLM's output and enhancing the overall user experience.

Key takeaways:

Greptile initially faced issues with its AI code review bot generating too many comments, leading to user complaints and ignored feedback.
Attempts to reduce nit comments through prompt engineering and using LLMs as evaluators were unsuccessful.
The realization that nits are subjective led to the development of a clustering approach using vector embeddings to filter comments based on team-specific feedback.
The clustering method significantly improved the address rate of comments from 19% to over 55%, demonstrating its effectiveness in reducing noise.

How to Make LLMs Shut Up

Key takeaways:

Comments (0)

Newsletter