Let's Think Dot by Dot: Hidden Computation in Transformer Language Models

The article discusses the role of intermediate tokens in enhancing the performance of language models. It reveals that transformers can use meaningless filler tokens to solve complex algorithmic tasks that they couldn't solve without these tokens. However, learning to use these filler tokens is challenging and requires dense supervision. The study also provides a theoretical characterization of problems where filler tokens are beneficial.

The research raises concerns about the potential for unauditable, hidden computations in large language models due to the use of filler tokens. It concludes that additional tokens can offer computational advantages regardless of token choice, and these tokens do not necessarily need to provide information about the intermediate computational steps in multi-token computations.

Key takeaways:

Language models using chain-of-thought responses show improved performance across most benchmarks.
Transformers can use meaningless filler tokens in place of a chain of thought to solve complex algorithmic tasks.
Learning to use filler tokens is challenging and requires specific, dense supervision to converge.
The use of filler tokens raises concerns about large language models engaging in unauditable, hidden computations that are increasingly detached from the observed chain-of-thought tokens.

Let's Think Dot by Dot: Hidden Computation in Transformer Language Models

Key takeaways:

Comments (0)

Newsletter