🚀 Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks (ICCV 2025 official implementation)

👨‍💻 Authors

Jiawei Wang^*¹, Yushen Zuo^*², Yuanjun Chai³, Zhendong Liu⁴, Yicheng Fu⁵, Yichun Feng^†⁶, Kin-man Lam^†²

¹University of Science and Technology of China
²The Hong Kong Polytechnic University
³University of Washington
⁴Nanjing University
⁵Stanford University
⁶University of the Chinese Academy of Sciences

📧 Contact Emails:
[email protected], [email protected], [email protected], [email protected],
[email protected], [email protected], [email protected]

^* Equal contribution.
^† Corresponding authors.

Welcome! This repository hosts the official implementation of our paper, "Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks."

🌟 What’s New?

We propose state-of-the-art solutions to enhance the robustness of Vision-Language Models (VLMs) against Gaussian noise and adversarial attacks. Key highlights include:

🎯 Robust-VLGuard: A pioneering multimodal safety dataset covering both aligned and misaligned image-text pair scenarios.
🛡️ DiffPure-VLM: A novel defense framework that leverages diffusion models to neutralize adversarial noise by transforming it into Gaussian-like noise, significantly improving VLM resilience.

✨ Key Contributions

🔍 Conducted a comprehensive vulnerability analysis revealing the sensitivity of mainstream VLMs to Gaussian noise.
📚 Developed Robust-VLGuard, a dataset designed to improve model robustness without compromising helpfulness or safety alignment.
⚙️ Introduced DiffPure-VLM, an effective pipeline for defending against complex optimization-based adversarial attacks.
📈 Demonstrated strong performance across multiple benchmarks, outperforming existing baseline methods.

💡 Quickstart

🛠️ Installation

Different models require different environments. We provide conda environment files in the env_configs directory. For example:

conda env create -f env_configs/environment-omi.yml
conda activate Omi-Environment

📁 Pretrained Models Setup

mkdir -p ckpts/
ln -s your_path/vicuna ckpts/vicuna
ln -s your_path/pretrained_minigpt4.pth ckpts/pretrained_minigpt4.pth
mkdir -p ckpts/diffpure_models/diffusion/Guide_Diffusion/
ln -s your_path/256x256_diffusion_uncond.pt ckpts/diffpure_models/diffusion/Guide_Diffusion/256x256_diffusion_uncond.pt

MiniGPT-4: https://drive.google.com/file/d/1a4zLvaiDBr-36pasffmgpvH5P7CKmpze/view
Vicuna: https://huggingface.co/Vision-CAIR/vicuna/tree/main
Diffusion Model: https://openaipublic.blob.core.windows.net/diffusion/jul-2021/256x256_diffusion_uncond.pt

📅 Dataset Setup

RealToxicityPrompts is available in the harmful_corpus/ directory.
Download Robust-VLGuard from Huggingface
Noisy MMVet benchmark: Google Drive

🎓 Fine-tuning VLMs

Our Robust-VLGuard dataset is preprocessed and ready for fine-tuning. You can use the official code of the respective VLMs, but the Gaussian Noise Augmentation Strategy must be implemented. We have already incorporated this strategy into the official codebases. To see the implementation for LLaVA, refer to this commit. You can also directly fine-tune using this repo. For other codebases, you can follow the implementation approach used in LLaVA.

📂 Fine-tuned Models

Model Name	Hugging Face Path
llava-v1.5-7b-RobustVLGuard	LLaVA
MiniGPT4-RobustVLGuard	MiniGPT-4
InternVL2-8B-RobustVLGuard	InternVL2-8B

🌐 Evaluation

On RealToxicityPrompts

bash general_scripts/omi_eval_rtp.sh {OUTPUT_PATH} adversarial_images/clean.jpeg {MODEL_PATH}

or

bash general_scripts/omi_eval_rtp.sh {OUTPUT_PATH} adversarial_images_add_noise_G30/clean.jpeg {MODEL_PATH}

On MMVet Benchmark (LLaVA)

python llava_inference_mmvet.py --model_path {MODEL_PATH} --clean --output_path {OUTPUT_PATH}

or

python llava_inference_mmvet.py --model_path {MODEL_PATH} --output_path {OUTPUT_PATH}

Use minigpt_inference_mmvet.py for MiniGPT-4. For InternVL2, refer to this commit.

IMPORTANT: Remember to replace the mmvet_path and image_path in the python script with the correct paths.

🔧 Optimization-based Adversarial Attack

bash llava-attack.sh {GPU_ID} {OUTPUT_PATH} {MODEL_PATH} {EPSILON}

Where EPSILON controls perturbation strength (e.g., 16, 32, or 64). For MiniGPT4, use minigpt_visual_attack.py. For InternVL2, refer to this commit.

🔒 Deploying DiffPure-VLM Defense

Run the following command:

bash general_scripts/omi_eval_rtp_diffpure.sh {output_path} {image_prompt_path} {model_path} {def_num_denoising_steps}

For MiniGPT-4 and Qwen-VL, use:

minigpt_scripts/minigpt_eval_rtp_diffpure_single_gpu.sh
qwen25_vl_scripts/qwen25_vl_rtp_diffpure.sh

🧪 Deploying JailGuard Defense

We provide a script for JailGuard defense for comparison. Use the following command:

bash general_scripts/omi_eval_rtp_jailguard.sh {output_path} {image_prompt_path} {model_path}

📊 Experimental Results

Detailed results and analysis are included in the paper and supplementary materials. See the results/ directory for specific outcomes.

📜 Citation

If you find this work helpful, please consider citing our paper.

📃 License

This project is licensed under the MIT License. See the LICENSE file for more details.

📢 Contact

For questions or collaboration opportunities, feel free to reach out at [[email protected]]. We welcome your feedback!

📝 Acknowledgments

Our repo is built upon https://github.com/Unispac/Visual-Adversarial-Examples-Jailbreak-Large-Language-Models. Thanks to the authors of the original models and datasets, including LLaVA, MiniGPT-4, InternVL2, MMVet, and others. We also acknowledge the support of our institutions and collaborators. We are grateful for the resources and tools provided by the community that made this research possible. We are committed to advancing the field of multimodal learning and look forward to future collaborations.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.vscode		.vscode
adversarial_images		adversarial_images
adversarial_images_add_noise_G30		adversarial_images_add_noise_G30
adversarial_images_add_noise_G50		adversarial_images_add_noise_G50
adversarial_images_add_noise_G75		adversarial_images_add_noise_G75
assets		assets
blip_utils		blip_utils
defense/diffusion		defense/diffusion
diffusion_configs		diffusion_configs
env_configs		env_configs
eval_configs		eval_configs
finetuning_scripts		finetuning_scripts
general_scripts		general_scripts
guided_diffusion		guided_diffusion
harmful_corpus		harmful_corpus
jailguard_utils		jailguard_utils
lavis		lavis
llava_llama_2		llava_llama_2
llava_llama_2_utils		llava_llama_2_utils
llava_v1_5_utils		llava_v1_5_utils
metric		metric
minigpt4		minigpt4
minigpt_scripts		minigpt_scripts
minigpt_utils		minigpt_utils
notebooks		notebooks
qwen25_vl_scripts		qwen25_vl_scripts
results		results
vlm_interface		vlm_interface
.amltconfig		.amltconfig
.amltignore		.amltignore
.gitignore		.gitignore
README-Original.md		README-Original.md
README.md		README.md
amlt-run-diffpure.yaml		amlt-run-diffpure.yaml
amlt-run.yaml		amlt-run.yaml
cal_metrics.py		cal_metrics.py
cal_metrics_markdown.py		cal_metrics_markdown.py
demo.py		demo.py
get_metric.py		get_metric.py
instructblip_inference.py		instructblip_inference.py
instructblip_visual_attack.py		instructblip_visual_attack.py
llava-attack.sh		llava-attack.sh
llava_inference_mmvet.py		llava_inference_mmvet.py
llava_inference_mmvet_diffpure.py		llava_inference_mmvet_diffpure.py
llava_llama_v2_inference.py		llava_llama_v2_inference.py
llava_llama_v2_visual_attack.py		llava_llama_v2_visual_attack.py
llava_v1_5_visual_attack.py		llava_v1_5_visual_attack.py
load_diffusion_model.py		load_diffusion_model.py
minigpt_inference.py		minigpt_inference.py
minigpt_inference_diffpure.py		minigpt_inference_diffpure.py
minigpt_inference_mmvet.py		minigpt_inference_mmvet.py
minigpt_red_teaming.py		minigpt_red_teaming.py
minigpt_test_manual_prompts_text_llm.py		minigpt_test_manual_prompts_text_llm.py
minigpt_test_manual_prompts_visual_llm.py		minigpt_test_manual_prompts_visual_llm.py
minigpt_textual_attack.py		minigpt_textual_attack.py
minigpt_visual_attack.py		minigpt_visual_attack.py
minigpt_visual_attack_diffpure.py		minigpt_visual_attack_diffpure.py
mmvet_diffpure_generator.py		mmvet_diffpure_generator.py
omi_general_inference.py		omi_general_inference.py
omi_general_inference_diffpure.py		omi_general_inference_diffpure.py
omi_general_inference_jailguard.py		omi_general_inference_jailguard.py
omi_general_inference_manual.py		omi_general_inference_manual.py
purification.py		purification.py
qwen25vl_general_inference.py		qwen25vl_general_inference.py
qwen25vl_general_inference_diffpure.py		qwen25vl_general_inference_diffpure.py
test.jpg		test.jpg
test.py		test.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks (ICCV 2025 official implementation)

👨‍💻 Authors

🌟 What’s New?

✨ Key Contributions

💡 Quickstart

🛠️ Installation

📁 Pretrained Models Setup

📅 Dataset Setup

🎓 Fine-tuning VLMs

📂 Fine-tuned Models

🌐 Evaluation

On RealToxicityPrompts

On MMVet Benchmark (LLaVA)

🔧 Optimization-based Adversarial Attack

🔒 Deploying DiffPure-VLM Defense

🧪 Deploying JailGuard Defense

📊 Experimental Results

📜 Citation

📃 License

📢 Contact

📝 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

JarvisUSTC/DiffPure-RobustVLM

Folders and files

Latest commit

History

Repository files navigation

🚀 Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks (ICCV 2025 official implementation)

👨‍💻 Authors

🌟 What’s New?

✨ Key Contributions

💡 Quickstart

🛠️ Installation

📁 Pretrained Models Setup

📅 Dataset Setup

🎓 Fine-tuning VLMs

📂 Fine-tuned Models

🌐 Evaluation

On RealToxicityPrompts

On MMVet Benchmark (LLaVA)

🔧 Optimization-based Adversarial Attack

🔒 Deploying DiffPure-VLM Defense

🧪 Deploying JailGuard Defense

📊 Experimental Results

📜 Citation

📃 License

📢 Contact

📝 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages