The Geometry of Truth: Dataexplorer

This article discusses a study by Samuel Marks and Max Tegmark on how large language models represent truth. The researchers used LLaMA-13B representations of factual statements, which exist in a 5120-dimensional space, and used PCA to select the two directions of greatest variation for the data, allowing them to create 2-dimensional pictures of the data. They explored basic datasets, negations, conjunctions and disjunctions, and the emergence of truth over layers. They found that even with simple plots, there's a lot to explore, such as the axes of variation and the visually apparent "truth directions".

The researchers also looked at more diverse datasets, including uncurated datasets adapted from other sources. They found that these datasets didn't separate into true/false clusters due to their diversity. However, they could see separation into true/false clusters when they used one of the PCA bases identified from their cleaner datasets. They also found that the representations contained more information than just truth/falsehood. For example, for a statement like "`x` is larger than `y`," the non-truth axis of variation seemed to represent the absolute value of the difference `x - y`. They also found a separated cluster for `cities_cities_conj`, which seemed to be for statements where the country in both halves of the conjunction are the same.

Key takeaways:

The Geometry of Truth: Dataexplorer is a page that contains interactive charts for exploring how large language models represent truth. It accompanies a paper by Samuel Marks and Max Tegmark.
The visualizations are produced by extracting LLaMA-13B representations of factual statements, which live in a 5120-dimensional space. PCA is used to select the two directions of greatest variation for the data, allowing for 2-dimensional pictures of the data.
The page explores different datasets, including basic datasets, negations, conjunctions and disjunctions, emergence over layers, and more diverse datasets. Each dataset provides different insights into how truth is represented in the model.
Some of the key findings include the existence of a Misalignment from Correlational Inconsistency (MCI) hypothesis, the emergence of features distinguishing true statements from false ones over the layers of LLaMA-13B, and the impact of dataset diversity on the separation into true/false clusters.

The Geometry of Truth: Dataexplorer

Key takeaways:

Comments (0)

Newsletter