π Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks (ICCV 2025 official implementation)
Jiawei Wang*1, Yushen Zuo*2, Yuanjun Chai3, Zhendong Liu4, Yicheng Fu5, Yichun Fengβ 6, Kin-man Lamβ 2
1University of Science and Technology of China
2The Hong Kong Polytechnic University
3University of Washington
4Nanjing University
5Stanford University
6University of the Chinese Academy of Sciences
π§ Contact Emails:
[email protected], [email protected], [email protected], [email protected],
[email protected], [email protected], [email protected]
* Equal contribution.
β Corresponding authors.
Welcome! This repository hosts the official implementation of our paper, "Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks."
We propose state-of-the-art solutions to enhance the robustness of Vision-Language Models (VLMs) against Gaussian noise and adversarial attacks. Key highlights include:
-
π― Robust-VLGuard: A pioneering multimodal safety dataset covering both aligned and misaligned image-text pair scenarios.
-
π‘οΈ DiffPure-VLM: A novel defense framework that leverages diffusion models to neutralize adversarial noise by transforming it into Gaussian-like noise, significantly improving VLM resilience.
- π Conducted a comprehensive vulnerability analysis revealing the sensitivity of mainstream VLMs to Gaussian noise.
- π Developed Robust-VLGuard, a dataset designed to improve model robustness without compromising helpfulness or safety alignment.
- βοΈ Introduced DiffPure-VLM, an effective pipeline for defending against complex optimization-based adversarial attacks.
- π Demonstrated strong performance across multiple benchmarks, outperforming existing baseline methods.
Different models require different environments. We provide conda
environment files in the env_configs
directory. For example:
conda env create -f env_configs/environment-omi.yml
conda activate Omi-Environment
mkdir -p ckpts/
ln -s your_path/vicuna ckpts/vicuna
ln -s your_path/pretrained_minigpt4.pth ckpts/pretrained_minigpt4.pth
mkdir -p ckpts/diffpure_models/diffusion/Guide_Diffusion/
ln -s your_path/256x256_diffusion_uncond.pt ckpts/diffpure_models/diffusion/Guide_Diffusion/256x256_diffusion_uncond.pt
- MiniGPT-4: https://drive.google.com/file/d/1a4zLvaiDBr-36pasffmgpvH5P7CKmpze/view
- Vicuna: https://huggingface.co/Vision-CAIR/vicuna/tree/main
- Diffusion Model: https://openaipublic.blob.core.windows.net/diffusion/jul-2021/256x256_diffusion_uncond.pt
- RealToxicityPrompts is available in the
harmful_corpus/
directory. - Download Robust-VLGuard from Huggingface
- Noisy MMVet benchmark: Google Drive
Our Robust-VLGuard dataset is preprocessed and ready for fine-tuning. You can use the official code of the respective VLMs, but the Gaussian Noise Augmentation Strategy must be implemented. We have already incorporated this strategy into the official codebases. To see the implementation for LLaVA, refer to this commit. You can also directly fine-tune using this repo. For other codebases, you can follow the implementation approach used in LLaVA.
Model Name | Hugging Face Path |
---|---|
llava-v1.5-7b-RobustVLGuard | LLaVA |
MiniGPT4-RobustVLGuard | MiniGPT-4 |
InternVL2-8B-RobustVLGuard | InternVL2-8B |
bash general_scripts/omi_eval_rtp.sh {OUTPUT_PATH} adversarial_images/clean.jpeg {MODEL_PATH}
or
bash general_scripts/omi_eval_rtp.sh {OUTPUT_PATH} adversarial_images_add_noise_G30/clean.jpeg {MODEL_PATH}
python llava_inference_mmvet.py --model_path {MODEL_PATH} --clean --output_path {OUTPUT_PATH}
or
python llava_inference_mmvet.py --model_path {MODEL_PATH} --output_path {OUTPUT_PATH}
Use minigpt_inference_mmvet.py
for MiniGPT-4. For InternVL2, refer to this commit.
IMPORTANT: Remember to replace the mmvet_path and image_path in the python script with the correct paths.
bash llava-attack.sh {GPU_ID} {OUTPUT_PATH} {MODEL_PATH} {EPSILON}
Where EPSILON
controls perturbation strength (e.g., 16, 32, or 64). For MiniGPT4, use minigpt_visual_attack.py
. For InternVL2, refer to this commit.
Run the following command:
bash general_scripts/omi_eval_rtp_diffpure.sh {output_path} {image_prompt_path} {model_path} {def_num_denoising_steps}
For MiniGPT-4 and Qwen-VL, use:
minigpt_scripts/minigpt_eval_rtp_diffpure_single_gpu.sh
qwen25_vl_scripts/qwen25_vl_rtp_diffpure.sh
We provide a script for JailGuard defense for comparison. Use the following command:
bash general_scripts/omi_eval_rtp_jailguard.sh {output_path} {image_prompt_path} {model_path}
Detailed results and analysis are included in the paper and supplementary materials. See the results/
directory for specific outcomes.
If you find this work helpful, please consider citing our paper.
This project is licensed under the MIT License. See the LICENSE
file for more details.
For questions or collaboration opportunities, feel free to reach out at [[email protected]]. We welcome your feedback!
Our repo is built upon https://github.com/Unispac/Visual-Adversarial-Examples-Jailbreak-Large-Language-Models. Thanks to the authors of the original models and datasets, including LLaVA, MiniGPT-4, InternVL2, MMVet, and others. We also acknowledge the support of our institutions and collaborators. We are grateful for the resources and tools provided by the community that made this research possible. We are committed to advancing the field of multimodal learning and look forward to future collaborations.