Skip to content

Maplebb/UniREditBench

Repository files navigation

UniREditBench: A Unified Reasoning-based Image Editing Benchmark

UnifiedReward Team

Shanghai Innovation Institute

Paper PDF Project Page

Hugging Face Spaces Hugging Face Spaces Hugging Face Spaces
Hugging Face Spaces

🔥 News

Introduction

We propose UniREditBench, a unified benchmark for reasoning-based image editing assessment with broader evaluation dimension coverage and robust evaluation pipeline. We also design an automated multi-scenario data synthesis pipeline and construct UniREdit-Data-100K, a large-scale synthetic dataset with high-quality chain-of-thought (CoT) reasoning annotations. We fine-tune Bagel on this dataset and develop UniREdit-Bagel, demonstrating substantial improvements in both in-domain and out-of-distribution settings.

image

image

✨ Highlights:

  • Broader Scenario and Reasoning Dimension Coverage: It contains 2,700 high-quality samples organized into 8 primary reasoning dimensions and 18 sub-categories, spanning both real-world and game-world image editing tasks.

  • Reliable Dual-Reference Evaluation.: For each sample assessment, we design both the textual reference and ground-truth (GT) image reference. This multi-modal reference enables vision-language model (VLM) evaluators to perform direct and fine-grained comparisons at both the textual and visual levels with the generated images, leading to more reliable evaluation.

image

image

image

🔥 Set Up Environment

conda create -n uniredit python=3.10 -y
conda activate uniredit
pip install -r requirements.txt
pip install flash_attn==2.7.0.post1 --no-build-isolation

You can also install flash_attn via:

# for cuda11 torch2.5.x
pip install "https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.0.post1/flash_attn-2.7.0.post1+cu11torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl"

# for cuda12 torch2.5.x
pip install "https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.0.post1/flash_attn-2.7.0.post1+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl"

🔧 Benchmark and Checkpoint Preparation

  1. Benchmark Preparation
huggingface-cli download --resume-download maplebb/UniREditBench  --local-dir ./UniREditBench
cd UniREditBench
unzip original_image.zip
unzip reference_image.zip
  1. UniREdit-Bagel Checkpoint Preparation
huggingface-cli download --resume-download maplebb/UniREdit-Bagel  --local-dir ./ckpt

pip install safetensors

python merge_ckpt.py

📑 Prompt Introduction

Each prompt in our benchmark is recorded as a dict in a .json file, combining with structured annotations for evaluation.

  • original_image_path: Path of the original image.
  • reference_image_path: Path of the reference image.
  • instruction: The editing instruction.
  • rules(only for game-world scenario): The concise descriptions of the specific game rules.
  • name: The name of evaluation dimension.
  • idx: Index of the evaluation example.
  • reference_effect: The textual reference of edited effect.

🚀 Inference

GPUS=8
model_path=./ckpt
input_path=./UniREditBench
output_path=./output_images

# Image Editing with Reasoning
torchrun \
    --nnodes=1 \
    --nproc_per_node=$GPUS \
    gen_images_mp_uniredit.py \
    --input_dir $input_path \
    --output_dir $output_path \
    --metadata_file ./UniREditBench/data.json \
    --max_latent_size 64 \
    --model-path $model_path \
    --think

✨ Evaluation

We are using the API version: gpt-4.1-2025-04-14

python -u eval/gpt_eval_uniredit.py \
  --input ./UniREditBench \
  --data ./UniREditBench/data.json \
  --output ./output_images \
  --nproc 6
  • A detailed .csv results file will also be saved in the /dir_of_edit_images directory.

💻 UniREdit-Data-100K Download

huggingface-cli download --repo-type dataset --resume-download maplebb/UniREdit-Data-100K  --local-dir ./UniREdit-Data-100K

cd UniREdit-Data-100K

unzip UniREdit-Data-100K.zip

📧 Contact

If you have any comments or questions, please open a new issue or feel free to contact Feng Han and Yibin Wang.

⭐ Citation

@article{unireditbench,
  title={UniREditBench: A Unified Reasoning-based Image Editing Benchmark},
  author={Han, Feng and Wang, Yibin and Li, Chenglin and Liang, Zheming and Wang, Dianyi and Jiao, Yang and Wei, Zhipeng and Gong, Chao and Jin, Cheng and Chen, Jingjing and others},
  journal={arXiv preprint arXiv:2511.01295},
  year={2025}
}

About

Offline implementation of UniREditBench: A Unified Reasoning-based Image Editing Benchmark.

Resources

Stars

Watchers

Forks