GitHub - dpaleka/llm-chess-proofgame: LLMs playing chess are sensitive to how the position came to be

The article discusses a project named ChessLLM that tests the sensitivity of Language Learning Models (LLMs) that play chess to irrelevant factors. The project currently supports the OpenAI text completion API and has been tested on GPT-3.5-turbo-instruct. The LLMs take the game state as a PGN (Portable Game Notation) and predict the next move. The project found that the model's thinking might vary based on irrelevant factors, such as the sequence of moves leading to a particular position.

The article also provides a guide on how to reproduce the experiment using various Python scripts. The experiment involves downloading puzzles and games from Lichess, converting them to FEN (Forsyth-Edwards Notation), generating proof games, and comparing the model's performance. The author suggests potential next steps, including logging the model's rate of illegal moves, trying other models, and testing other sources of spurious features. The project has minimal dependencies and requires an OpenAI key. It is licensed under GPL v3.

Key takeaways:

The ChessLLM project tests how sensitive language models that play chess, like GPT-3.5-turbo-instruct, are to irrelevant factors other than the position on the board.
The model's decision-making in a given position can vary based on irrelevant factors, such as the sequence of moves leading up to the position.
A tool called proofgame is used to construct pairs of positions, and the model's performance is tested on puzzles and reported based on both the original game and the constructed game leading to the same position.
The project is a work in progress and future steps include logging the model's rate of illegal moves, trying other models, testing other sources of spurious features, and figuring out the implications of these experiments.

GitHub - dpaleka/llm-chess-proofgame: LLMs playing chess are sensitive to how the position came to be

Key takeaways:

Comments (0)

Newsletter