Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

New Theory Suggests Chatbots Can Understand Text | Quanta Magazine

Jan 22, 2024 - quantamagazine.org
The article discusses the debate over whether large language models (LLMs), such as chatbots, truly understand the text they generate or are simply "stochastic parrots" that combine pre-existing information without reference to meaning. Researchers Sanjeev Arora and Anirudh Goyal propose a theory suggesting that as these models grow and are trained on more data, they improve on individual language-related abilities and develop new ones, hinting at understanding. They use mathematical objects called random graphs to model the behavior of LLMs, focusing on "bipartite" graphs that represent pieces of text and the skills needed to understand them.

Arora and Goyal's theory suggests that as an LLM's size increases and its test loss decreases, it gets better at using more than one skill at a time and begins generating text using multiple skills. This leads to a combinatorial explosion of abilities, which they argue is proof that the largest LLMs don't just rely on combinations of skills they saw in their training data. They tested their theory using a method called "skill-mix" to evaluate an LLM's ability to use multiple skills to generate text, and found that the models behaved almost exactly as expected.

Key takeaways:

  • Artificial intelligence researchers are debating whether large language models (LLMs), which power modern chatbots, truly understand what they're saying or are just 'stochastic parrots' that combine information they've already seen without reference to meaning.
  • Sanjeev Arora of Princeton University and Anirudh Goyal, a research scientist at Google DeepMind, have developed a theory suggesting that as LLMs get bigger and are trained on more data, they improve on individual language-related abilities and develop new ones by combining skills in a way that suggests understanding.
  • The researchers used mathematical objects called random graphs to model the behavior of LLMs, and found that as a model gets bigger, its loss on test data decreases in a specific manner, suggesting improved skills and abilities.
  • Arora and Goyal's theory has been tested and found that large LLMs behave almost exactly as expected, leading to the conclusion that the largest LLMs are not just parroting what they've seen before, but are capable of generalization and combining skills in ways not seen in their training data.
View Full Article

Comments (0)

Be the first to comment!