The research raises concerns about the potential for unauditable, hidden computations in large language models due to the use of filler tokens. It concludes that additional tokens can offer computational advantages regardless of token choice, and these tokens do not necessarily need to provide information about the intermediate computational steps in multi-token computations.
Key takeaways:
- Language models using chain-of-thought responses show improved performance across most benchmarks.
- Transformers can use meaningless filler tokens in place of a chain of thought to solve complex algorithmic tasks.
- Learning to use filler tokens is challenging and requires specific, dense supervision to converge.
- The use of filler tokens raises concerns about large language models engaging in unauditable, hidden computations that are increasingly detached from the observed chain-of-thought tokens.