Skip to content

NUS-TRAIL/NoisyRollout

Repository files navigation

NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation

Paper Hugging Face Collection

⚡ Updates

  • 20/05/2025: 🔥 We update the checkpoints of our trained models (larger model sizes, more training data)!
  • 18/04/2025: 🎉 We release our paper, models and codebase.

🚀 TL;DR

NoisyRollout is a simple and effective data augmentation strategy for the RL training of VLMs that improves visual reasoning through better policy exploration. It introduces targeted rollout diversity by mixing rollouts from both clean and moderately distorted images, encouraging the model to learn more robust behaviors. Moreover, a noise annealing schedule is implemented to ensure early-stage exploration and late-stage training stability.

🎯 Key Benefits:

  • No additional cost — only the rollout strategy is modified
  • Easy to adopt — no changes to the model architecture or RL objective required
  • Superior generalization — achieves state-of-the-art results on 5 out-of-domain benchmarks (e.g., MathVerse: 53.2%, HallusionBench: 72.1%) with just 2.1K RL samples

🫱 No complicated changes — just smarter rollouts and better training!

🛠️ Usage

(Step1) Install

First, download the wheel of vllm from this link.

conda create -n noisyrollout python=3.11 -y && conda activate noisyrollout

pip3 install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 transformers==4.49.0 numpy==1.26.4
pip3 install google-generativeai

# Use this version of vLLM to avoid memory leaks.
pip3 install vllm-0.7.4.dev65+g22757848-cp38-abi3-manylinux1_x86_64.whl
git clone -b verl_v1 https://github.com/hiyouga/vllm.git
cp -r vllm/vllm/ ~/miniconda3/envs/noisyrollout/lib/python3.11/site-packages/

pip3 install -e .

(Step 2) Training

# Geo3K (NoisyRollout)
bash training_scripts/qwen2_5_vl_7b_geo3k_noisyrollout.sh
# Geo3K (Vanilla GRPO)
bash training_scripts/qwen2_5_vl_7b_geo3k_grpo.sh

# K12 (NoisyRollout)
bash training_scripts/qwen2_5_vl_7b_k12_noisyrollout.sh
# K12 (Vanilla GRPO)
bash training_scripts/qwen2_5_vl_7b_k12_grpo.sh

(Step 3) Evaluation

Before running the evaluation, please download the evaluation datasets from 🤗 NoisyRollout Evaluation. Then, create a directory by running mkdir -p ~/NoisyRollout/eval/data, upload the eval_data.zip file to the data folder, and unzip it there.

#!/bin/bash
source ~/.bashrc
source ~/miniconda3/bin/activate noisyrollout

export VLLM_ATTENTION_BACKEND=XFORMERS # remove it when using 32b models
export VLLM_USE_V1=0 # remove it when using 32b models
export GOOGLE_API_KEY="xxx" # put your api key here

HF_MODEL_PATH="xyliu6/NoisyRollout-Geo3K-7B"
RESULTS_DIR="results/"
EVAL_DIR="~/NoisyRollout/eval"
DATA_DIR="~/NoisyRollout/eval/data"

SYSTEM_PROMPT="""You FIRST think about the reasoning process as an internal monologue and then provide the final answer. The reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE put in \boxed{}."""

cd $EVAL_DIR
python main.py \
  --model $HF_MODEL_PATH \
  --output-dir $RESULTS_DIR \
  --data-path $DATA_DIR \
  --datasets geo3k,hallubench,mathvista,wemath,mathverse,mathvision \
  --tensor-parallel-size 2 \
  --system-prompt="$SYSTEM_PROMPT" \
  --min-pixels 262144 \
  --max-pixels 1000000 \
  --max-model-len 8192 \
  --temperature 0.0 \
  --eval-threads 24 \
  --version="7b" # change it to `32b` when using 32b models

🚧 Currently, only Gemini-2.0-Flash-001 is supported for parsing generated responses. Support for additional models will be introduced in future updates.

Citation

If you find our works useful for your research, please consider citing:

@article{liu2025noisyrollout,
  title={Noisyrollout: Reinforcing visual reasoning with data augmentation},
  author={Liu, Xiangyan and Ni, Jinjie and Wu, Zijian and Du, Chao and Dou, Longxu and Wang, Haonan and Pang, Tianyu and Shieh, Michael Qizhe},
  journal={arXiv preprint arXiv:2504.13055},
  year={2025}
}

Acknowledgement

About

NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published