The researchers also use reinforcement learning with a carefully designed reward function to calibrate the confidence estimates. This encourages LLMs to provide accurate, high-confidence predictions and penalizes overconfidence in incorrect outputs. The experimental results show that SaySelf effectively reduces the confidence calibration error and maintains task performance. The generated self-reflective rationales are found to be reasonable and contribute further to the calibration.
Key takeaways:
- The paper introduces SaySelf, a training framework that teaches Large Language Models (LLMs) to express more accurate fine-grained confidence estimates.
- SaySelf also directs LLMs to produce self-reflective rationales that identify gaps in their parametric knowledge and explain their uncertainty.
- The framework uses reinforcement learning with a carefully designed reward function to calibrate the confidence estimates, encouraging LLMs to provide accurate, high-confidence predictions and penalizing overconfidence in incorrect outputs.
- Experimental results show that SaySelf is effective in reducing the confidence calibration error and maintaining task performance, with the generated self-reflective rationales contributing to the calibration.