The findings suggest that ICL doesn't merely memorize specific demonstrations, but extracts a meaningful semantic representation of the overall task. This discovery could lead to more efficient adaptation of LLMs to new tasks with limited data. However, the study focused on relatively simple tasks, and more complex ICL cases likely involve more intricate representations. The specific mechanisms used to construct and apply the task vectors also remain unclear.
Key takeaways:
- In-context learning (ICL) in large language models works by creating a 'task vector' that captures the essence of the examples provided, according to new research from scientists at Tel Aviv University and Google DeepMind.
- The ICL process involves a 'learning' part of the model creating a task vector from examples, and an 'application' part using this vector and a new example to generate output.
- Experiments using 18 diverse tasks and major public AI models supported this view of how ICL works, with task vectors encoding meaningful information about each task and guiding the model's behavior.
- These findings suggest that ICL extracts a meaningful semantic representation of the overall task, opening new possibilities for efficiently adapting large language models to new tasks with limited data.