Model Resonance Imaging — visualize LLM internals like a brain MRI
-
Updated
Apr 10, 2026 - Python
Model Resonance Imaging — visualize LLM internals like a brain MRI
We study whether categorical refusal tokens enable controllable and interpretable safety behavior in language models.
Mechanistic interpretability tool visualizing GPT-2's layer-by-layer predictions using the logit lens technique
MSc Thesis: Bridging mechanistic interpretability circuits to faithful natural language explanations using ERASER evaluation metrics
Measuring attention similarity bias in GPT-2 variants. Replicates Figure 1 of arXiv:2603.09078 (Exclusive Self Attention, Apple 2026) using TransformerLens.
Replication of 'From Reasoning to Answer' (EMNLP 2025) — Reasoning-Focus Heads + Activation Patching on DeepSeek-R1-Distill-Qwen-7B
Mechanistic interpretability tool to detect induction heads in GPT-2 using TransformerLens
Mechanistic interpretability study comparing modular addition and subtraction circuits in 1-layer attention-only transformers via activation patching, logit lens, SVD circuit analysis, Fourier feature analysis, and causal scrubbing across three training stages.
🧩 Simplify causal intervention in transformer models with this modular library for accurate circuit analysis and behavior identification.
Causal intervention framework for mechanistic interpretability research. Implements activation patching methodology for identifying causally important components in transformer language models.
"Arithmetic Without Algorithms": Mechanistic analysis of arithmetic failure ("5+5=6") in GPT-2 Small using Induction Heads and Sparse Autoencoders (SAEs).
Automated Forensic Discovery of Reasoning Circuits in Transformers
Forensic suite for Mechanistic Interpretability in Transformers. Implementing 0.0054 Basal Accountability Gradients for auditing model logic using TransformerLens and SAELens
Code used for reverse-engineering a “Query-Gated Courier” circuit in Gemma-2-2B for role-gated retrieval.
A research tool for studying how deception emerges in multi-agent LLM systems and detecting it through activation analysis.
Add a description, image, and links to the transformer-lens topic page so that developers can more easily learn about it.
To associate your repository with the transformer-lens topic, visit your repo's landing page and select "manage topics."