Skip to content

Latest commit

 

History

History
77 lines (58 loc) · 4.88 KB

File metadata and controls

77 lines (58 loc) · 4.88 KB
logo

tdhook 🤖🪝

Documentation tdhook license uv Ruff python versions arXiv

codecov ci publish docs

Interpretability with tensordict and torch hooks.

Getting Started

Most methods should work with minimal configuration. Here's a basic example of running Integrated Gradients on a VGG16 model (full example available here):

from tdhook.attribution import IntegratedGradients

# Define attribution target (e.g., zebra class = 340)
def init_attr_targets(targets, _):
    zebra_logit = targets["output"][..., 340]
    return TensorDict(out=zebra_logit, batch_size=targets.batch_size)

# Compute attribution
with IntegratedGradients(init_attr_targets=init_attr_targets).prepare(model) as hooked_model:
    td = TensorDict({
        "input": image_tensor,
        ("baseline", "input"): torch.zeros_like(image_tensor) # required for integrated gradients
    }).unsqueeze(0)
    td = hooked_model(td) # Access attribution with td.get(("attr", "input"))

To dig deeper, see the documentation.

Skills

An agent skill is available for tdhook. It provides AI guidance for attribution, activation analysis, probing, steering, and weight-level interventions—including when to use each method and how to wire TensorDict keys.

Features

Config

This project uses uv to manage python dependencies and run scripts, as well as just to run commands.

Citation

If you're using tdhook in your research, please cite it using the following BibTeX entry:

@misc{poupart2025tdhooklightweightframeworkinterpretability,
      title={TDHook: A Lightweight Framework for Interpretability},
      author={Yoann Poupart},
      year={2025},
      eprint={2509.25475},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2509.25475},
}

License

tdhook is licensed under the MIT License. See LICENSE for details.