Large language models use a surprisingly simple mechanism to retrieve some stored knowledge

Researchers at MIT and other institutions have discovered that large language models (LLMs), such as those used in AI chatbots, use a simple linear function to retrieve and decode stored facts. Despite the complexity of these models, the researchers found that the same decoding function is used for similar types of facts. By identifying these linear functions, the researchers can probe the model to see what it knows about new subjects and where that knowledge is stored. This could potentially be used in the future to find and correct inaccuracies within the model.

The researchers conducted a series of experiments and developed a method to estimate these simple functions. They computed functions for 47 different relations, such as "capital city of a country" and "lead singer of a band", and found that the correct information was retrieved more than 60% of the time. They also used this technique to create an "attribute lens", a tool that visualizes where specific information about a particular relation is stored within the model's layers. This could help scientists and engineers correct stored knowledge and prevent AI chatbots from providing false information.

Key takeaways:

Researchers at MIT and other institutions have found that large language models (LLMs) often use a simple linear function to recover and decode stored facts.
The researchers developed a method to estimate these simple functions, and then computed functions for 47 different relations, such as “capital city of a country” and “lead singer of a band.”
They used this probing technique to produce what they call an “attribute lens,” a grid that visualizes where specific information about a particular relation is stored within the transformer’s many layers.
In the future, this approach could be used to find and correct falsehoods inside the model, which could reduce a model’s tendency to sometimes give incorrect or nonsensical answers.

Large language models use a surprisingly simple mechanism to retrieve some stored knowledge

Key takeaways:

Comments (0)

Newsletter