GitHub - openai/transformer-debugger

The Transformer Debugger (TDB) is a tool developed by OpenAI's Superalignment team to investigate the specific behaviors of small language models. It combines automated interpretability techniques with sparse autoencoders, allowing users to intervene in the forward pass and observe its impact on specific behaviors. TDB identifies components contributing to behavior, generates explanations for their activation, and traces connections between components. It can be used to answer questions about model output and attention head behavior.

The release includes a Neuron viewer, a React app hosting TDB and providing information about individual model components, an Activation server, a backend server performing inference on a model to provide data for TDB, and a simple inference library for GPT-2 models. It also includes top-activating dataset examples for MLP neurons, attention heads, and autoencoder latents. The setup process involves installing the repo, setting up a virtual environment, and following instructions to set up the activation server backend and neuron viewer frontend.

Key takeaways:

Transformer Debugger (TDB) is a tool developed by OpenAI's Superalignment team to investigate specific behaviors of small language models, combining automated interpretability techniques with sparse autoencoders.
TDB can be used to answer questions about model outputs and attention, identifying specific components that contribute to the behavior and tracing connections between components.
The release includes a Neuron viewer, an Activation server, Models, and Collated activation datasets, providing a comprehensive toolset for investigating and understanding model behavior.
Instructions are provided for setting up the environment and installing the repo, including the creation and activation of a new virtual environment, and the installation of neuron_explainer and neuron_viewer.

GitHub - openai/transformer-debugger

Key takeaways:

Comments (0)

Newsletter