Researchers discover in-context learning creates task vectors in LLMs

Researchers from Tel Aviv University and Google DeepMind have discovered that in-context learning (ICL) in large language models (LLMs) like GPT-4 and Llama works by creating a "task vector" that captures the essence of the examples provided. This task vector, a set of numbers that encapsulates the core idea from the examples, guides the model in handling new examples without referring back to the original ones. This process was tested across various models and tasks, with results supporting the proposed view of how ICL works.

The findings suggest that ICL doesn't merely memorize specific demonstrations, but extracts a meaningful semantic representation of the overall task. This discovery could lead to more efficient adaptation of LLMs to new tasks with limited data. However, the study focused on relatively simple tasks, and more complex ICL cases likely involve more intricate representations. The specific mechanisms used to construct and apply the task vectors also remain unclear.

Key takeaways:

In-context learning (ICL) in large language models works by creating a 'task vector' that captures the essence of the examples provided, according to new research from scientists at Tel Aviv University and Google DeepMind.
The ICL process involves a 'learning' part of the model creating a task vector from examples, and an 'application' part using this vector and a new example to generate output.
Experiments using 18 diverse tasks and major public AI models supported this view of how ICL works, with task vectors encoding meaningful information about each task and guiding the model's behavior.
These findings suggest that ICL extracts a meaningful semantic representation of the overall task, opening new possibilities for efficiently adapting large language models to new tasks with limited data.

Researchers discover in-context learning creates task vectors in LLMs

Key takeaways:

Comments (0)

Newsletter