This repository contains the code for the preprint paper "Beyond the Final Layer: Attentive Multi-Layer Fusion for Vision Transformers" arXiv:2601.09322
- Repository Structure
- Environment Setup and Setup of the Project
- Downloading the Datasets from Hugging Face
- Running Experiments and Reproducing the Plots
- Acknowledgments
- Citation
├── src/ # Core source code
│ ├── data/ # Dataset handling and loading
│ ├── models/ # ThingsVision wrappers for feature extraction
│ ├── tasks/ # Experiment tasks (linear and attentive probes, hyperparameter tuning)
│ ├── eval/ # Evaluation metrics
│ └── utils/ # Utility functions
├── scripts/ # Experiment scripts and configurations
│ ├── configs/ # Configuration files for models and datasets
│ └── download_ds/ # Dataset download utilities
├── notebooks/ # Analysis and visualization notebooks
│ ├── main_section/ # Main paper figures
│ └── appendix_section/ # Appendix materials
└── data/ # Static data files
Note: All experimental data (datasets, features, model checkpoints, results, etc.) is stored in the project root directory as described in 2. Project Structure.
Note
Experiments were conducted on a SLURM cluster with Apptainer. Instructions below are tailored for this setup.
We provide pre-built containers with all dependencies:
- Docker:
docker://ghcr.io/lciernik/attentive-layer-fusion:latest - Apptainer:
oras://ghcr.io/lciernik/attentive-layer-fusion:latest-sif
Configure your project root directory in scripts/project_location.py:
[PROJECT_ROOT]/
├── datasets/ # Downloaded datasets
├── features/ # Extracted features
├── models/ # Trained probe models
├── model_similarities/ # Model similarity matrices
└── results/ # Experimental results
Build the directory structure: python scripts/project_location.py
All the datasets used in the experiments have been downloaded from the CLIP Benchmark repository on Hugging Face. The script scripts/download_ds/download_datasets.sh downloads the datasets.
Ensure to use [BASE_PATH_PROJECT]/datasets as the target directory. Please follow the instructions in the README to download the datasets.
Tip
If you only want to reproduce the visualizations of the experiments, use the pre-aggregated results in data/results/aggregated/ and run the notebooks in notebooks/main_section.
- Connect to a compute node, via
srun:srun --partition=cpu-5h --mem=64G --pty bash - Run an interactive shell in the container:
apptainer run --nv --writable-tmpfs -B [BASE_PATH_PROJECT] [CONTAINER_PATH] /bin/bash - Navigate to the repository root directory:
cd $PATH_TO_REPO - Run any task (below)
To run any task, you must be in the container and in the repository root directory, i.e., at PATH_TO_REPO.
-
Feature extraction: All datasets (entire dataset)
python scripts/feature_extraction.py -
Model evaluation:
- Single layer evaluation
- All datasets on last layer:
python scripts/single_model_evaluation.py - On appendix datasets and all layers:
python scripts/single_model_evaluation_all_intermediate_layers.py
- All datasets on last layer:
- Linear probe (multi-layer)
- on pre-extracted features:
python scripts/combined_models_evaluation_linear_probe_large_experiments.py
- on pre-extracted features:
- Attentive probe (multi-layer or all tokens last layer)
- on pre-extracted features:
python scripts/combined_models_evaluation_attn_probe_large_experiments.py - end-to-end:
python scripts/end_2_end_eval_attentive_probe_frozen_backbone.py
- on pre-extracted features:
- Fine-tuning (classic with linear classification head on top)
- end-to-end:
python scripts/- end-to-end:python scripts/end_2_end_eval_linear_probe_finetuning.py.py
- end-to-end:
- Representationals similarity computation:
python scripts/distance_matrix_computation_all_layers.py
- Single layer evaluation
Tip
Run experiments on a single machine using commands in the job_cmd variable within each script.
Note
The indicated scripts are to reproduce our main experiments. These can be used as a template to run e.g., end-to-end multi-layer attentive probe training on a single dataset for a specific model by modifying the respective script.
Before you can start reproducing the visualizations, you need to have the results of the experiments and aggregate them.
- Aggregating the results:
- Run notebook
notebooks/aggregate_results.ipynbto aggregate the results. - Or use the pre-aggregated results in
data/results/aggregated/.
- Run notebook
- Visualizing the results:
- Run the notebooks in
notebooks/main_sectionto visualize the results.
- Run the notebooks in
This project builds upon or uses heavily the following repositories and works:
- ThingsVision for feature extraction and end-to-end model evaluation
- similarity_consistency repository
If you find this work interesting or useful in your research, please cite our preprint:
@misc{2026finallayerattentivemultilayer,
title={Beyond the final layer: Attentive multilayer fusion for vision transformers},
author={Laure Ciernik and Marco Morik and Lukas Thede and Luca Eyring and Shinichi Nakajima and Zeynep Akata and Lukas Muttenthaler},
year={2026},
eprint={2601.09322},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.09322},
}
If you have any feedback, questions, or ideas, please feel free to raise an issue in this repository. Alternatively, you can reach out to us directly via email for more in-depth discussions or suggestions.
📧 Contact us: ciernik[at]tu-berlin.de or m.morik[at]tu-berlin.de
Thank you for your interest and support!
