OPRO begins with a meta-prompt as input, which includes a task description, examples of problems, placeholders for prompt instructions, and corresponding solutions. The LLM generates candidate solutions based on the problem description and previous solutions. These solutions are evaluated and assigned a quality score, with optimal solutions added to the meta-prompt for the next round of solution generation. The process continues until the model stops proposing better solutions. The researchers found that OPRO can optimize LLM prompts to get maximum accuracy from the models.
Key takeaways:
- DeepMind researchers have proposed a new method called Optimization by PROmpting (OPRO), which uses AI large language models (LLMs) as optimizers. The optimization task is defined in natural language rather than through formal mathematical definitions.
- OPRO begins with a meta-prompt as input, which includes a natural language description of the task, examples of problems, placeholders for prompt instructions, and corresponding solutions. The LLM generates candidate solutions, which are evaluated and assigned a quality score. The process continues until the model stops proposing better solutions.
- OPRO can optimize prompts for LLMs like OpenAI’s ChatGPT and Google’s PaLM, guiding these models to find the best prompt that maximizes task accuracy. The researchers found that all LLMs in their evaluation were able to serve as optimizers, consistently improving the performance of the generated prompts.
- The researchers caution against anthropomorphizing LLMs, as semantically similar prompts can yield vastly different results. However, OPRO provides a systematic way to explore the vast space of possible LLM prompts and find the one that works best for a specific type of problem.