`tdhook` 🤖🪝

Interpretability with tensordict and torch hooks.

Getting Started

Most methods should work with minimal configuration. Here's a basic example of running Integrated Gradients on a VGG16 model (full example available here):

from tdhook.attribution import IntegratedGradients

# Define attribution target (e.g., zebra class = 340)
def init_attr_targets(targets, _):
    zebra_logit = targets["output"][..., 340]
    return TensorDict(out=zebra_logit, batch_size=targets.batch_size)

# Compute attribution
with IntegratedGradients(init_attr_targets=init_attr_targets).prepare(model) as hooked_model:
    td = TensorDict({
        "input": image_tensor,
        ("baseline", "input"): torch.zeros_like(image_tensor) # required for integrated gradients
    }).unsqueeze(0)
    td = hooked_model(td) # Access attribution with td.get(("attr", "input"))

To dig deeper, see the documentation.

Skills

An agent skill is available for tdhook. It provides AI guidance for attribution, activation analysis, probing, steering, and weight-level interventions—including when to use each method and how to wire TensorDict keys.

Features

Config

This project uses uv to manage python dependencies and run scripts, as well as just to run commands.

Citation

If you're using tdhook in your research, please cite it using the following BibTeX entry:

@misc{poupart2025tdhooklightweightframeworkinterpretability,
      title={TDHook: A Lightweight Framework for Interpretability},
      author={Yoann Poupart},
      year={2025},
      eprint={2509.25475},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2509.25475},
}

License

tdhook is licensed under the MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`tdhook` 🤖🪝

Getting Started

Skills

Features

Config

Citation

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

tdhook 🤖🪝

Getting Started

Skills

Features

Config

Citation

License

`tdhook` 🤖🪝