The player's performance is evaluated using a Calibration Score, which reflects the accuracy of their confidence ratings in relation to the LLM's actual performance. A perfectly calibrated response means the player's confidence ratings precisely match the actual accuracy of the LLM's responses across all prompts. The game includes 20 prompts, and a Calibration Score of 0 is the best possible score, indicating the player's predictions perfectly aligned with the LLM's performance.
Key takeaways:
- The Calibration game is a research tool designed to improve the identification of hallucination in Language Learning Models (LLMs).
- Players rate their confidence in the LLM's ability to provide a correct response to prompts, with 0 indicating certainty of an incorrect response and 1 indicating certainty of a correct response.
- Players' performance is evaluated based on a Calibration Score, which reflects the accuracy of their confidence ratings in relation to the LLM's actual performance.
- A Calibration Score of 0 is the best possible score, indicating that the player's predictions perfectly aligned with the LLM's performance.