What can LLMs never do?

The article discusses the limitations of Large Language Models (LLMs) in solving certain problems. Despite their ability to pass complex tasks, they struggle with seemingly simple questions and tasks such as creating word grids or playing sudoku. The author identifies two main issues: LLMs can't solve problems if the information isn't in their training data, and they struggle with problems due to their construction. The author suggests that LLMs have a "goal drift" where they can't generalise beyond the context within the prompt and don't know where to focus their attention.

The author also discusses his attempts to train transformers to predict cellular automata, a task which proved more challenging than expected. Despite being able to learn some rules, the models failed to generalise and struggled with tasks requiring memory and computation. The author concludes that while LLMs can mimic computation and learn implicit associations within data, they struggle with tasks requiring iterative reasoning and maintaining a consistent goal. The author suggests that this could be partially addressed through methods such as chain of thought or using other LLMs to review and correct output.

Key takeaways:

Large Language Models (LLMs) have shown impressive capabilities but still struggle with seemingly simple tasks, leading to an exploration of their failure modes.
Despite their ability to answer complex questions, LLMs fail at tasks such as creating wordgrids or playing sudoku, and suffer from a "Reversal Curse" where they struggle to generalize information in reverse.
The author suggests that LLMs have a "goal drift" where they lose focus and struggle to generalize beyond the context within the prompt, and they cannot reset their own context dynamically.
While LLMs can be improved with clever prompting and iteration, they still struggle with tasks that require memory and computation, indicating that they demonstrate more intuition than intelligence.

What can LLMs never do?

Key takeaways:

Comments (0)

Newsletter