transformer-lens

Here are 16 public repositories matching this topic...

JihoonJeong / Neural-MRI

Model Resonance Imaging — visualize LLM internals like a brain MRI

react visualization transformers pytorch d3js interpretability fastapi llm mechanistic-interpretability transformer-lens

Updated Apr 10, 2026
Python

RishabSA / interp-refusal-tokens

Star

We study whether categorical refusal tokens enable controllable and interpretable safety behavior in language models.

machine-learning research ai deep-learning pytorch artificial-intelligence safety llama steering neurips llm mechanistic-interpretability llm-safety refusal llama3 transformer-lens llm-refusal

Updated Apr 13, 2026
Python

designer-coderajay / logit-lens-explorer

Star

Mechanistic interpretability tool visualizing GPT-2's layer-by-layer predictions using the logit lens technique

nlp deep-learning transformers pytorch gpt-2 streamlit mechanistic-interpretability transformer-lens

Updated Feb 13, 2026
Python

designer-coderajay / Causally-Grounded-Mechanistic-Interpretability-for-LLMs-with-Faithful-Natural-Language-Explanations

Star

MSc Thesis: Bridging mechanistic interpretability circuits to faithful natural language explanations using ERASER evaluation metrics

msc-thesis explainability gpt-2 natural-language-explanations mechanistic-interpretability transformer-lens eraser-metrics

Updated Mar 15, 2026
Jupyter Notebook

himanshuvnm / TransformerLensCausalTracing

Star

neural-network transformer transformer-architecture causal-tracing transformer-lens

Updated Jan 22, 2026
Jupyter Notebook

FrancescoPaoloL / xsa_POC

Star

Measuring attention similarity bias in GPT-2 variants. Replicates Figure 1 of arXiv:2603.09078 (Exclusive Self Attention, Apple 2026) using TransformerLens.

python transformers attention-mechanism mechanistic-interpretability transformer-lens

Updated Apr 25, 2026
Python

atgugu / mechinterp-rfh-replication

Star

Replication of 'From Reasoning to Answer' (EMNLP 2025) — Reasoning-Focus Heads + Activation Patching on DeepSeek-R1-Distill-Qwen-7B

reasoning attention-heads mechanistic-interpretability llm-interpretability deepseek-r1 emnlp-2025 transformer-lens

Updated Mar 27, 2026
Jupyter Notebook

designer-coderajay / induction-head-detector

Star

Mechanistic interpretability tool to detect induction heads in GPT-2 using TransformerLens

nlp machine-learning deep-learning transformers pytorch gpt-2 attention-heads mechanistic-interpretability transformer-lens

Updated Dec 15, 2025
Python

adayilmax / mechanistic-circuit-comparison

Star

Mechanistic interpretability study comparing modular addition and subtraction circuits in 1-layer attention-only transformers via activation patching, logit lens, SVD circuit analysis, Fourier feature analysis, and causal scrubbing across three training stages.

transformers circuits pytorch modular-arithmetic attention-mechanism circuit-analysis fourier-analysis grokking interpretable-machine-learning algorithmic-analysis mechanistic-interpretability causal-intervention activation-patching transformer-lens

Updated May 1, 2026
Python

jdavidks / activation-patching-framework

Star

🧩 Simplify causal intervention in transformer models with this modular library for accurate circuit analysis and behavior identification.

microsoft nlp powershell office ida patching interpretability spotify-windows emulated-kms-servers spotify-ads hexrays massgrave spotify-no-ads kmsvlall tsforge transformer-lens

Updated May 1, 2026
Python

designer-coderajay / activation-patching-framework

Star

Causal intervention framework for mechanistic interpretability research. Implements activation patching methodology for identifying causally important components in transformer language models.

nlp machine-learning deep-learning pytorch interpretability gpt-2 mechanistic-interpretability causal-tracing activation-patching transformer-lens

Updated Dec 17, 2025
Python

beachcities / gpt2-arithmetic-mechanistic-analysis-

Star

"Arithmetic Without Algorithms": Mechanistic analysis of arithmetic failure ("5+5=6") in GPT-2 Small using Induction Heads and Sparse Autoencoders (SAEs).

python research sparse-autoencoders gpt-2 mechanistic-interpretability transformer-lens

Updated Jan 1, 2026
Jupyter Notebook

sagnikc395 / circuit-surgeon

Star

Automated Forensic Discovery of Reasoning Circuits in Transformers

pytorch llms mech-interp transformer-lens

Updated Apr 28, 2026
Python

nekaeve24 / Neural-DNA-Forensics

Star

Forensic suite for Mechanistic Interpretability in Transformers. Implementing 0.0054 Basal Accountability Gradients for auditing model logic using TransformerLens and SAELens

pytorch ai-safety quantitative-research mechanistic-interpretability transformer-lens

Updated Mar 9, 2026
Python

komikat / prep-gated-circuits

Star

Code used for reverse-engineering a “Query-Gated Courier” circuit in Gemma-2-2B for role-gated retrieval.

mechanistic-interpretability gemma-2-2b transformer-lens

Updated Aug 23, 2025
Jupyter Notebook

tesims / multiagent-emergent-deception

Star

A research tool for studying how deception emerges in multi-agent LLM systems and detecting it through activation analysis.

alignment gemma sparse-autoencoders multi-agent-systems ai-safety emergent-behavior interpretability deception-detection activation-analysis mechanistic-interpretability llm-agents gemma-2b gemma-scope transformer-lens linear-probes

Updated Jan 11, 2026
Python

Improve this page

Add a description, image, and links to the transformer-lens topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the transformer-lens topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transformer-lens

Here are 16 public repositories matching this topic...

JihoonJeong / Neural-MRI

RishabSA / interp-refusal-tokens

designer-coderajay / logit-lens-explorer

designer-coderajay / Causally-Grounded-Mechanistic-Interpretability-for-LLMs-with-Faithful-Natural-Language-Explanations

himanshuvnm / TransformerLensCausalTracing

FrancescoPaoloL / xsa_POC

atgugu / mechinterp-rfh-replication

designer-coderajay / induction-head-detector

adayilmax / mechanistic-circuit-comparison

jdavidks / activation-patching-framework

designer-coderajay / activation-patching-framework

beachcities / gpt2-arithmetic-mechanistic-analysis-

sagnikc395 / circuit-surgeon

nekaeve24 / Neural-DNA-Forensics

komikat / prep-gated-circuits

tesims / multiagent-emergent-deception

Improve this page

Add this topic to your repo