Interpretability for LLMs and Agents Bootcamp

This repository contains reference implementations created by the Vector AI Engineering team for the Interpretability for LLMs and Agents Bootcamp — a hands-on program exploring interpretability, fairness, alignment, and agentic evaluation of large language and vision-language models.

About This Bootcamp

The bootcamp covers six core topics spanning the modern AI interpretability and evaluation landscape. Each implementation is a self-contained reference that demonstrates techniques from recent research, with fully reproducible notebooks and evaluation pipelines.

Repository Structure

docs/: Additional documentation and setup guides.
implementations/: One directory per topic, each containing notebooks and a README.
pyproject.toml: Centralizes project settings, build requirements, and dependencies.
scripts/: Utility scripts for environment setup and data preparation.

Implementations

#	Topic	Description
1	XAI Refresher	Foundations of explainable AI — feature attribution, saliency maps, and model-agnostic explanation methods
2	Bias & Fairness Analysis	Detecting and mitigating bias in ML models across demographic groups
3	Preference Alignment	LLM alignment with human preferences using DPO framework
4	Multimedia RAG + VLM	Cross-modal retrieval-augmented generation with ImageBind (audio, video, text)
5	Agentic ChartQA Evaluation	Multi-agent evaluation harness for chart-based VQA using CrewAI and ChartQAPro
6	Mechanistic Interpretability	Sparse Autoencoders for LLM feature discovery, and logit-lens + activation patching for VLM modality fusion

Getting Started

Clone this repository:

git clone <repo-url>
cd interpretability_agent_bootcamp

Install uv if you haven't already:

curl -LsSf https://astral.sh/uv/install.sh | sh

Install dependencies for the topic you want to work with. All dependency groups are defined in the root pyproject.toml — install only the group(s) you need:

Topic	Group name	Install command
XAI Refresher	`xai-refresher`	`uv sync --group xai-refresher`
Bias & Fairness Analysis	None	`uv sync`
Mechanistic Interpretability	`mechanistic-interp`	`uv sync --group mechanistic-interp`
Preference Alignment (DPO)	`preference-alignment`	`uv sync --group preference-alignment`
Multimedia RAG	`multimedia-rag`	`uv sync --group multimedia-rag`
Agentic ChartQA Eval	`agentic-xai-eval`	`uv sync --group agentic-xai-eval`

Conflict note: The mechanistic-interp and xai-refresher groups cannot be installed together — they have conflicting datasets package requirements. Install only one at a time.

CUDA note (ref4 — Preference Alignment): The group uses torch==2.6.0 from PyPI (which includes CUDA support on Linux). If you specifically need the CUDA 12.4 build, run:

uv sync --group preference-alignment \
  --index-url https://download.pytorch.org/whl/cu124

Launch JupyterLab and open the notebooks in the relevant implementations/<topic>/ directory:
```
uv run jupyter lab
```

Run integration tests to validate that your API keys are set up correctly:

uv run --env-file .env pytest -sv tests/test_integration.py

Note: If your .env file is incomplete or needs to be updated, you can re-run onboarding manually from inside your Coder workspace (from the repo root):
onboard --bootcamp-name "llm-interpretability-bootcamp" --output-dir "." --test-script "./aieng-llm-interp/tests/test_integration.py" --env-example "./.env.example" --test-marker "integration_test" --force

License

This project is licensed under the terms of the LICENSE file in the root directory.

Contributing

Please read CONTRIBUTING.md before submitting pull requests.

Contact

For questions or help navigating this repository, contact Aravind Narayanan at aravind.narayanan@vectorinstitute.ai

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github		.github
aieng-llm-interp		aieng-llm-interp
docs		docs
implementations		implementations
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
GUIDELINES.md		GUIDELINES.md
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interpretability for LLMs and Agents Bootcamp

About This Bootcamp

Repository Structure

Implementations

Getting Started

License

Contributing

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Interpretability for LLMs and Agents Bootcamp

About This Bootcamp

Repository Structure

Implementations

Getting Started

License

Contributing

Contact

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages