The researchers conducted a series of experiments and developed a method to estimate these simple functions. They computed functions for 47 different relations, such as "capital city of a country" and "lead singer of a band", and found that the correct information was retrieved more than 60% of the time. They also used this technique to create an "attribute lens", a tool that visualizes where specific information about a particular relation is stored within the model's layers. This could help scientists and engineers correct stored knowledge and prevent AI chatbots from providing false information.
Key takeaways:
- Researchers at MIT and other institutions have found that large language models (LLMs) often use a simple linear function to recover and decode stored facts.
- The researchers developed a method to estimate these simple functions, and then computed functions for 47 different relations, such as “capital city of a country” and “lead singer of a band.”
- They used this probing technique to produce what they call an “attribute lens,” a grid that visualizes where specific information about a particular relation is stored within the transformer’s many layers.
- In the future, this approach could be used to find and correct falsehoods inside the model, which could reduce a model’s tendency to sometimes give incorrect or nonsensical answers.