Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

How Quickly Do Large Language Models Learn Unexpected Skills? | Quanta Magazine

Mar 25, 2024 - news.bensbites.co
The Beyond the Imitation Game benchmark (BIG-bench) project, which compiled a list of tasks to test large language models (LLMs), found that the performance of these models improved as they scaled up. However, some tasks saw a sudden jump in performance, described as "breakthrough" behavior or "emergent" abilities. A new paper by Stanford University researchers argues that these sudden abilities are a result of the way performance is measured, not an inherent feature of the models. They suggest that the transition is more predictable than previously thought, and that the perception of emergence has much to do with the chosen measurement method.

The Stanford researchers used the example of three-digit addition, where LLMs were previously judged only on accuracy, leading to a perceived sudden ability to add at a certain threshold. They retested the task using a metric that awards partial credit, showing that the ability to add is not emergent, but gradual and predictable. They argue that the improvement in LLMs as they scale up is due to the added complexity of larger models, not sudden, unpredictable jumps in ability.

Key takeaways:

  • The Beyond the Imitation Game benchmark project compiled a list of tasks to test the capabilities of large language models (LLMs), and found that performance improved as the models scaled up, but with some tasks, the improvement wasn't smooth.
  • Researchers have described the sudden improvement in performance as 'emergent' behavior, likening it to a phase transition in physics, but a new paper by Stanford University researchers argues that this is a consequence of how performance is measured.
  • The Stanford researchers argue that the abilities of LLMs are neither unpredictable nor sudden, and that the perceived 'emergence' of abilities is more to do with the choice of measurement than the model's inner workings.
  • The Stanford team tested the LLMs using a metric that awards partial credit, showing that as parameters increased, the LLMs predicted an increasingly correct sequence of digits in addition problems, suggesting that the ability to add isn't emergent but gradual and predictable.
View Full Article

Comments (0)

Be the first to comment!