Ask HN: How do you iterate on and manage prompts in production?

The article discusses a script and an input library that allows for a comparative analysis of a new candidate prompt against the existing one in production. This process involves a configuration that includes the prompt, the language model to use, temperature, etc. The script is run with various inputs and produces a JSON blob of outputs for scoring.

Most of the score dimensions are deterministic, with some new ones integrated with a language model for scoring. This, however, introduces the challenge of scoring the scoring prompt. The outputs are manually scanned to ensure they make sense. The article mentions that they are not doing any fine tuning yet as satisfactory results are being achieved with just prompting.

Key takeaways:

A script and input library are used for head-to-head comparison of a new candidate prompt with the existing production.
The process involves a configuration that includes the prompt, the LLM to use, temperature, etc.
Most score dimensions are deterministic, but some have been added where an LLM is integrated for scoring.
Manual scanning of outputs is done for a sanity check, and fine tuning is not yet being done as good results are being achieved with just prompting.

Ask HN: How do you iterate on and manage prompts in production?

Key takeaways:

Comments (0)

Newsletter