This repository contains reference implementations created by the Vector AI Engineering team for the Interpretability for LLMs and Agents Bootcamp — a hands-on program exploring interpretability, fairness, alignment, and agentic evaluation of large language and vision-language models.
The bootcamp covers six core topics spanning the modern AI interpretability and evaluation landscape. Each implementation is a self-contained reference that demonstrates techniques from recent research, with fully reproducible notebooks and evaluation pipelines.
- docs/: Additional documentation and setup guides.
- implementations/: One directory per topic, each containing notebooks and a README.
- pyproject.toml: Centralizes project settings, build requirements, and dependencies.
- scripts/: Utility scripts for environment setup and data preparation.
| # | Topic | Description |
|---|---|---|
| 1 | XAI Refresher | Foundations of explainable AI — feature attribution, saliency maps, and model-agnostic explanation methods |
| 2 | Bias & Fairness Analysis | Detecting and mitigating bias in ML models across demographic groups |
| 3 | Preference Alignment | LLM alignment with human preferences using DPO framework |
| 4 | Multimedia RAG + VLM | Cross-modal retrieval-augmented generation with ImageBind (audio, video, text) |
| 5 | Agentic ChartQA Evaluation | Multi-agent evaluation harness for chart-based VQA using CrewAI and ChartQAPro |
| 6 | Mechanistic Interpretability | Sparse Autoencoders for LLM feature discovery, and logit-lens + activation patching for VLM modality fusion |
-
Clone this repository:
git clone <repo-url> cd interpretability_agent_bootcamp
-
Install uv if you haven't already:
curl -LsSf https://astral.sh/uv/install.sh | sh -
Install dependencies for the topic you want to work with. All dependency groups are defined in the root
pyproject.toml— install only the group(s) you need:Topic Group name Install command XAI Refresher xai-refresheruv sync --group xai-refresherBias & Fairness Analysis None uv syncPreference Alignment (DPO) preference-alignmentuv sync --group preference-alignmentMultimedia RAG multimedia-raguv sync --group multimedia-ragAgentic ChartQA Eval agentic-xai-evaluv sync --group agentic-xai-evalMechanistic Interpretability mechanistic-interpuv sync --group mechanistic-interpConflict note: The
mechanistic-interpandxai-refreshergroups cannot be installed together — they have conflictingdatasetspackage requirements. Install only one at a time.CUDA note (ref4 — Preference Alignment): The group uses
torch==2.6.0from PyPI (which includes CUDA support on Linux). If you specifically need the CUDA 12.4 build, run:uv sync --group preference-alignment \ --index-url https://download.pytorch.org/whl/cu124
-
Launch JupyterLab and open the notebooks in the relevant
implementations/<topic>/directory:uv run jupyter lab
-
Run integration tests to validate that your API keys are set up correctly:
uv run --env-file .env pytest -sv tests/test_integration.py
Note: If your
.envfile is incomplete or needs to be updated, you can re-run onboarding manually from inside your Coder workspace (from the repo root):onboard --bootcamp-name "llm-interpretability-bootcamp" --output-dir "." --test-script "./aieng-llm-interp/tests/test_integration.py" --env-example "./.env.example" --test-marker "integration_test" --force
This project is licensed under the terms of the LICENSE file in the root directory.
Please read CONTRIBUTING.md before submitting pull requests.
For questions or help navigating this repository, contact Aravind Narayanan at aravind.narayanan@vectorinstitute.ai