The release includes a Neuron viewer, a React app hosting TDB and providing information about individual model components, an Activation server, a backend server performing inference on a model to provide data for TDB, and a simple inference library for GPT-2 models. It also includes top-activating dataset examples for MLP neurons, attention heads, and autoencoder latents. The setup process involves installing the repo, setting up a virtual environment, and following instructions to set up the activation server backend and neuron viewer frontend.
Key takeaways:
- Transformer Debugger (TDB) is a tool developed by OpenAI's Superalignment team to investigate specific behaviors of small language models, combining automated interpretability techniques with sparse autoencoders.
- TDB can be used to answer questions about model outputs and attention, identifying specific components that contribute to the behavior and tracing connections between components.
- The release includes a Neuron viewer, an Activation server, Models, and Collated activation datasets, providing a comprehensive toolset for investigating and understanding model behavior.
- Instructions are provided for setting up the environment and installing the repo, including the creation and activation of a new virtual environment, and the installation of neuron_explainer and neuron_viewer.