The author criticizes the idea that transformers are Turing-complete and capable of computing any function, stating that transformers, with their fixed size, cannot become a category C program (a program that never halts but it cannot be proved that it never halts). The author concludes by suggesting that while transformers can be useful tools, they are not the path to achieving general intelligence. Instead, the author advocates for program synthesis and other approaches to solve intelligence benchmarks.
Key takeaways:
- The author argues that intelligence requires the ability to explore potentially never-ending "trains of thought" and problem-solving requires arbitrary amounts of time. If a computer program is bound to finish quickly by virtue of its architecture, it cannot possibly be capable of general problem-solving.
- There are three types of computer programs: Programs that eventually finish executing (or “halt”), Programs that provably never halt, and Programs that never halt but for which there is no proof. The author argues that a generally intelligent program must have the property that it takes unbounded amounts of time to run and for some problems, it will never halt.
- Transformers, a feed-forward neural network architecture, are not capable of general problem-solving according to the author. This is because transformers produce each token within a fixed amount of time and their state must not be describable with a string of fixed length. If the number of reachable states for a program is limited, independent of the input, then that program cannot become a category C program.
- The author suggests that the way forward might be program synthesis and that while transformer-based models are failing at solving intelligence benchmarks, they should still be used as tools.