Skip to content

ICCV 2025 official implementation for Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks

Notifications You must be signed in to change notification settings

JarvisUSTC/DiffPure-RobustVLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks (ICCV 2025 official implementation)

πŸ‘¨β€πŸ’» Authors

Jiawei Wang*1, Yushen Zuo*2, Yuanjun Chai3, Zhendong Liu4, Yicheng Fu5, Yichun Feng†6, Kin-man Lam†2

1University of Science and Technology of China
2The Hong Kong Polytechnic University
3University of Washington
4Nanjing University
5Stanford University
6University of the Chinese Academy of Sciences

πŸ“§ Contact Emails:
[email protected], [email protected], [email protected], [email protected],
[email protected], [email protected], [email protected]

* Equal contribution.
† Corresponding authors.


arXiv dataset

Welcome! This repository hosts the official implementation of our paper, "Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks."


🌟 What’s New?

We propose state-of-the-art solutions to enhance the robustness of Vision-Language Models (VLMs) against Gaussian noise and adversarial attacks. Key highlights include:

  • 🎯 Robust-VLGuard: A pioneering multimodal safety dataset covering both aligned and misaligned image-text pair scenarios.

    RobustVLGuard

  • πŸ›‘οΈ DiffPure-VLM: A novel defense framework that leverages diffusion models to neutralize adversarial noise by transforming it into Gaussian-like noise, significantly improving VLM resilience.

    DiffPure-VLM


✨ Key Contributions

  • πŸ” Conducted a comprehensive vulnerability analysis revealing the sensitivity of mainstream VLMs to Gaussian noise.
  • πŸ“š Developed Robust-VLGuard, a dataset designed to improve model robustness without compromising helpfulness or safety alignment.
  • βš™οΈ Introduced DiffPure-VLM, an effective pipeline for defending against complex optimization-based adversarial attacks.
  • πŸ“ˆ Demonstrated strong performance across multiple benchmarks, outperforming existing baseline methods.

πŸ’‘ Quickstart

πŸ› οΈ Installation

Different models require different environments. We provide conda environment files in the env_configs directory. For example:

conda env create -f env_configs/environment-omi.yml
conda activate Omi-Environment

πŸ“ Pretrained Models Setup

mkdir -p ckpts/
ln -s your_path/vicuna ckpts/vicuna
ln -s your_path/pretrained_minigpt4.pth ckpts/pretrained_minigpt4.pth
mkdir -p ckpts/diffpure_models/diffusion/Guide_Diffusion/
ln -s your_path/256x256_diffusion_uncond.pt ckpts/diffpure_models/diffusion/Guide_Diffusion/256x256_diffusion_uncond.pt

πŸ“… Dataset Setup

  • RealToxicityPrompts is available in the harmful_corpus/ directory.
  • Download Robust-VLGuard from Huggingface
  • Noisy MMVet benchmark: Google Drive

πŸŽ“ Fine-tuning VLMs

Our Robust-VLGuard dataset is preprocessed and ready for fine-tuning. You can use the official code of the respective VLMs, but the Gaussian Noise Augmentation Strategy must be implemented. We have already incorporated this strategy into the official codebases. To see the implementation for LLaVA, refer to this commit. You can also directly fine-tune using this repo. For other codebases, you can follow the implementation approach used in LLaVA.

πŸ“‚ Fine-tuned Models

Model Name Hugging Face Path
llava-v1.5-7b-RobustVLGuard LLaVA
MiniGPT4-RobustVLGuard MiniGPT-4
InternVL2-8B-RobustVLGuard InternVL2-8B

🌐 Evaluation

On RealToxicityPrompts

bash general_scripts/omi_eval_rtp.sh {OUTPUT_PATH} adversarial_images/clean.jpeg {MODEL_PATH}

or

bash general_scripts/omi_eval_rtp.sh {OUTPUT_PATH} adversarial_images_add_noise_G30/clean.jpeg {MODEL_PATH}

On MMVet Benchmark (LLaVA)

python llava_inference_mmvet.py --model_path {MODEL_PATH} --clean --output_path {OUTPUT_PATH}

or

python llava_inference_mmvet.py --model_path {MODEL_PATH} --output_path {OUTPUT_PATH}

Use minigpt_inference_mmvet.py for MiniGPT-4. For InternVL2, refer to this commit.

IMPORTANT: Remember to replace the mmvet_path and image_path in the python script with the correct paths.

πŸ”§ Optimization-based Adversarial Attack

bash llava-attack.sh {GPU_ID} {OUTPUT_PATH} {MODEL_PATH} {EPSILON}

Where EPSILON controls perturbation strength (e.g., 16, 32, or 64). For MiniGPT4, use minigpt_visual_attack.py. For InternVL2, refer to this commit.

πŸ”’ Deploying DiffPure-VLM Defense

Run the following command:

bash general_scripts/omi_eval_rtp_diffpure.sh {output_path} {image_prompt_path} {model_path} {def_num_denoising_steps}

For MiniGPT-4 and Qwen-VL, use:

  • minigpt_scripts/minigpt_eval_rtp_diffpure_single_gpu.sh
  • qwen25_vl_scripts/qwen25_vl_rtp_diffpure.sh

πŸ§ͺ Deploying JailGuard Defense

We provide a script for JailGuard defense for comparison. Use the following command:

bash general_scripts/omi_eval_rtp_jailguard.sh {output_path} {image_prompt_path} {model_path}

πŸ“Š Experimental Results

Detailed results and analysis are included in the paper and supplementary materials. See the results/ directory for specific outcomes.

Table 1 Table 2 Table 3


πŸ“œ Citation

If you find this work helpful, please consider citing our paper.


πŸ“ƒ License

This project is licensed under the MIT License. See the LICENSE file for more details.


πŸ“’ Contact

For questions or collaboration opportunities, feel free to reach out at [[email protected]]. We welcome your feedback!

πŸ“ Acknowledgments

Our repo is built upon https://github.com/Unispac/Visual-Adversarial-Examples-Jailbreak-Large-Language-Models. Thanks to the authors of the original models and datasets, including LLaVA, MiniGPT-4, InternVL2, MMVet, and others. We also acknowledge the support of our institutions and collaborators. We are grateful for the resources and tools provided by the community that made this research possible. We are committed to advancing the field of multimodal learning and look forward to future collaborations.

About

ICCV 2025 official implementation for Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published