Skip to content

lciernik/attentive-layer-fusion

Repository files navigation

Beyond the Final Layer: Attentive Multi-Layer Fusion for Vision Transformers

Overview of Attentive Multi-Layer Fusion for Vision Transformers

This repository contains the code for the preprint paper "Beyond the Final Layer: Attentive Multi-Layer Fusion for Vision Transformers" arXiv:2601.09322

Table of Contents

Repository Structure

├── src/                   # Core source code
│   ├── data/              # Dataset handling and loading
│   ├── models/            # ThingsVision wrappers for feature extraction
│   ├── tasks/             # Experiment tasks (linear and attentive probes, hyperparameter tuning)
│   ├── eval/              # Evaluation metrics
│   └── utils/             # Utility functions
├── scripts/               # Experiment scripts and configurations
│   ├── configs/           # Configuration files for models and datasets
│   └── download_ds/       # Dataset download utilities
├── notebooks/             # Analysis and visualization notebooks
│   ├── main_section/     # Main paper figures
│   └── appendix_section/ # Appendix materials
└── data/                  # Static data files

Note: All experimental data (datasets, features, model checkpoints, results, etc.) is stored in the project root directory as described in 2. Project Structure.

Environment Setup and Setup of the Project

1. Environment Setup

Note

Experiments were conducted on a SLURM cluster with Apptainer. Instructions below are tailored for this setup.

We provide pre-built containers with all dependencies:

  • Docker: docker://ghcr.io/lciernik/attentive-layer-fusion:latest
  • Apptainer: oras://ghcr.io/lciernik/attentive-layer-fusion:latest-sif

2. Project Structure

Configure your project root directory in scripts/project_location.py:

[PROJECT_ROOT]/
├── datasets/              # Downloaded datasets
├── features/              # Extracted features
├── models/                # Trained probe models
├── model_similarities/    # Model similarity matrices
└── results/               # Experimental results

Build the directory structure: python scripts/project_location.py

Downloading the Datasets from Hugging Face

All the datasets used in the experiments have been downloaded from the CLIP Benchmark repository on Hugging Face. The script scripts/download_ds/download_datasets.sh downloads the datasets.

Ensure to use [BASE_PATH_PROJECT]/datasets as the target directory. Please follow the instructions in the README to download the datasets.

Running Experiments and Reproducing the Plots

Tip

If you only want to reproduce the visualizations of the experiments, use the pre-aggregated results in data/results/aggregated/ and run the notebooks in notebooks/main_section.

Before You Start (on a SLURM Cluster with Apptainer)

  • Connect to a compute node, via srun: srun --partition=cpu-5h --mem=64G --pty bash
  • Run an interactive shell in the container: apptainer run --nv --writable-tmpfs -B [BASE_PATH_PROJECT] [CONTAINER_PATH] /bin/bash
  • Navigate to the repository root directory: cd $PATH_TO_REPO
  • Run any task (below)

Running Different Tasks

To run any task, you must be in the container and in the repository root directory, i.e., at PATH_TO_REPO.

  • Feature extraction: All datasets (entire dataset) python scripts/feature_extraction.py

  • Model evaluation:

    • Single layer evaluation
      • All datasets on last layer: python scripts/single_model_evaluation.py
      • On appendix datasets and all layers: python scripts/single_model_evaluation_all_intermediate_layers.py
    • Linear probe (multi-layer)
      • on pre-extracted features: python scripts/combined_models_evaluation_linear_probe_large_experiments.py
    • Attentive probe (multi-layer or all tokens last layer)
      • on pre-extracted features: python scripts/combined_models_evaluation_attn_probe_large_experiments.py
      • end-to-end: python scripts/end_2_end_eval_attentive_probe_frozen_backbone.py
    • Fine-tuning (classic with linear classification head on top)
      • end-to-end: python scripts/- end-to-end: python scripts/end_2_end_eval_linear_probe_finetuning.py.py
    • Representationals similarity computation: python scripts/distance_matrix_computation_all_layers.py

Tip

Run experiments on a single machine using commands in the job_cmd variable within each script.

Note

The indicated scripts are to reproduce our main experiments. These can be used as a template to run e.g., end-to-end multi-layer attentive probe training on a single dataset for a specific model by modifying the respective script.

Visualizations

Before you can start reproducing the visualizations, you need to have the results of the experiments and aggregate them.

  • Aggregating the results:
    • Run notebook notebooks/aggregate_results.ipynb to aggregate the results.
    • Or use the pre-aggregated results in data/results/aggregated/.
  • Visualizing the results:
    • Run the notebooks in notebooks/main_section to visualize the results.

Acknowledgments

This project builds upon or uses heavily the following repositories and works:

Citation

If you find this work interesting or useful in your research, please cite our preprint:

@misc{2026finallayerattentivemultilayer,
      title={Beyond the final layer: Attentive multilayer fusion for vision transformers}, 
      author={Laure Ciernik and Marco Morik and Lukas Thede and Luca Eyring and Shinichi Nakajima and Zeynep Akata and Lukas Muttenthaler},
      year={2026},
      eprint={2601.09322},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.09322}, 
}

Thank you

If you have any feedback, questions, or ideas, please feel free to raise an issue in this repository. Alternatively, you can reach out to us directly via email for more in-depth discussions or suggestions.

📧 Contact us: ciernik[at]tu-berlin.de or m.morik[at]tu-berlin.de

Thank you for your interest and support!

About

Attentive Multi-Layer Fusion for Vision Transformers

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors