One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
[ICCV 2025] A PyTorch official implementation for One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models.
Hao Fang*, Jiawei Kong*, Wenbo Yu, Bin Chen#, Jiawei Li, Hao Wu, Shu-Tao Xia, Ke Xu
We provide the environment configuration file exported by Anaconda, which can help you build up conveniently.
conda env create -f environment.yml
conda activate CPGC-
Download the datasets, Flickr30K, MSCOCO, and fill the
image_rootin the configuration files. -
Download the checkpoints of the finetuned VLP models: ALBEF, TCL, CLIP, BLIP, X-VLM
Below we provide running commands for training the contrastive-training perturbation generator with Flickr30K as the training set and ALBEF as the surrogate model.
python train.py --config configs/Retrieval_flickr_train.yaml --source_model ALBEF --source_ckpt $CKPTDownload the generators and UAPs.
Below we provide running commands for testing our method in Image-Text Retrieval (ITR) task:
python eval.py --config configs/Retrieval_flickr_test.yaml --source_model ALBEF --load_dir $UAP_PATH@article{fang2024one,
title={One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models},
author={Fang, Hao and Kong, Jiawei and Yu, Wenbo and Chen, Bin and Li, Jiawei and Xia, Shutao and Xu, Ke},
journal={arXiv preprint arXiv:2406.05491},
year={2024}
}
