Arora and Goyal's theory suggests that as an LLM's size increases and its test loss decreases, it gets better at using more than one skill at a time and begins generating text using multiple skills. This leads to a combinatorial explosion of abilities, which they argue is proof that the largest LLMs don't just rely on combinations of skills they saw in their training data. They tested their theory using a method called "skill-mix" to evaluate an LLM's ability to use multiple skills to generate text, and found that the models behaved almost exactly as expected.
Key takeaways:
- Artificial intelligence researchers are debating whether large language models (LLMs), which power modern chatbots, truly understand what they're saying or are just 'stochastic parrots' that combine information they've already seen without reference to meaning.
- Sanjeev Arora of Princeton University and Anirudh Goyal, a research scientist at Google DeepMind, have developed a theory suggesting that as LLMs get bigger and are trained on more data, they improve on individual language-related abilities and develop new ones by combining skills in a way that suggests understanding.
- The researchers used mathematical objects called random graphs to model the behavior of LLMs, and found that as a model gets bigger, its loss on test data decreases in a specific manner, suggesting improved skills and abilities.
- Arora and Goyal's theory has been tested and found that large LLMs behave almost exactly as expected, leading to the conclusion that the largest LLMs are not just parroting what they've seen before, but are capable of generalization and combining skills in ways not seen in their training data.