UniREditBench: A Unified Reasoning-based Image Editing Benchmark

Shanghai Innovation Institute

🔥 News

[2025/11/03] 🔥🔥 We release UniREditBench, UniREdit-Data-100K, UniREdit-Bagel-[BF16/FP32], and 🏆 Leaderboard !!
[2025/11/02] 🔥🔥 We release paper and project page of UniREditBench!!

Introduction

We propose UniREditBench, a unified benchmark for reasoning-based image editing assessment with broader evaluation dimension coverage and robust evaluation pipeline. We also design an automated multi-scenario data synthesis pipeline and construct UniREdit-Data-100K, a large-scale synthetic dataset with high-quality chain-of-thought (CoT) reasoning annotations. We fine-tune Bagel on this dataset and develop UniREdit-Bagel, demonstrating substantial improvements in both in-domain and out-of-distribution settings.

✨ Highlights:

Broader Scenario and Reasoning Dimension Coverage: It contains 2,700 high-quality samples organized into 8 primary reasoning dimensions and 18 sub-categories, spanning both real-world and game-world image editing tasks.
Reliable Dual-Reference Evaluation.: For each sample assessment, we design both the textual reference and ground-truth (GT) image reference. This multi-modal reference enables vision-language model (VLM) evaluators to perform direct and fine-grained comparisons at both the textual and visual levels with the generated images, leading to more reliable evaluation.

🔥 Set Up Environment

conda create -n uniredit python=3.10 -y
conda activate uniredit
pip install -r requirements.txt
pip install flash_attn==2.7.0.post1 --no-build-isolation

You can also install flash_attn via:

# for cuda11 torch2.5.x
pip install "https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.0.post1/flash_attn-2.7.0.post1+cu11torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl"

# for cuda12 torch2.5.x
pip install "https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.0.post1/flash_attn-2.7.0.post1+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl"

🔧 Benchmark and Checkpoint Preparation

Benchmark Preparation

huggingface-cli download --resume-download maplebb/UniREditBench  --local-dir ./UniREditBench
cd UniREditBench
unzip original_image.zip
unzip reference_image.zip

UniREdit-Bagel Checkpoint Preparation

huggingface-cli download --resume-download maplebb/UniREdit-Bagel  --local-dir ./ckpt

pip install safetensors

python merge_ckpt.py

📑 Prompt Introduction

Each prompt in our benchmark is recorded as a dict in a .json file, combining with structured annotations for evaluation.

original_image_path: Path of the original image.
reference_image_path: Path of the reference image.
instruction: The editing instruction.
rules(only for game-world scenario): The concise descriptions of the specific game rules.
name: The name of evaluation dimension.
idx: Index of the evaluation example.
reference_effect: The textual reference of edited effect.

🚀 Inference

GPUS=8
model_path=./ckpt
input_path=./UniREditBench
output_path=./output_images

# Image Editing with Reasoning
torchrun \
    --nnodes=1 \
    --nproc_per_node=$GPUS \
    gen_images_mp_uniredit.py \
    --input_dir $input_path \
    --output_dir $output_path \
    --metadata_file ./UniREditBench/data.json \
    --max_latent_size 64 \
    --model-path $model_path \
    --think

✨ Evaluation

We are using the API version: gpt-4.1-2025-04-14

python -u eval/gpt_eval_uniredit.py \
  --input ./UniREditBench \
  --data ./UniREditBench/data.json \
  --output ./output_images \
  --nproc 6

A detailed .csv results file will also be saved in the /dir_of_edit_images directory.

💻 UniREdit-Data-100K Download

huggingface-cli download --repo-type dataset --resume-download maplebb/UniREdit-Data-100K  --local-dir ./UniREdit-Data-100K

cd UniREdit-Data-100K

unzip UniREdit-Data-100K.zip

📧 Contact

If you have any comments or questions, please open a new issue or feel free to contact Feng Han and Yibin Wang.

⭐ Citation

@article{unireditbench,
  title={UniREditBench: A Unified Reasoning-based Image Editing Benchmark},
  author={Han, Feng and Wang, Yibin and Li, Chenglin and Liang, Zheming and Wang, Dianyi and Jiao, Yang and Wei, Zhipeng and Gong, Chao and Jin, Cheng and Chen, Jingjing and others},
  journal={arXiv preprint arXiv:2511.01295},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
UniREdit-Bagel		UniREdit-Bagel
docs		docs
eval		eval
.gitignore		.gitignore
README.md		README.md
gen_images_mp_uniredit.py		gen_images_mp_uniredit.py
merge_ckpt.py		merge_ckpt.py
requirements.txt		requirements.txt
run_mp_editing.sh		run_mp_editing.sh
run_scripts.sh		run_scripts.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UniREditBench: A Unified Reasoning-based Image Editing Benchmark

🔥 News

Introduction

✨ Highlights:

🔥 Set Up Environment

🔧 Benchmark and Checkpoint Preparation

📑 Prompt Introduction

🚀 Inference

✨ Evaluation

💻 UniREdit-Data-100K Download

📧 Contact

⭐ Citation

About

Uh oh!

Contributors 2

Languages

Maplebb/UniREditBench

Folders and files

Latest commit

History

Repository files navigation

UniREditBench: A Unified Reasoning-based Image Editing Benchmark

🔥 News

Introduction

✨ Highlights:

🔥 Set Up Environment

🔧 Benchmark and Checkpoint Preparation

📑 Prompt Introduction

🚀 Inference

✨ Evaluation

💻 UniREdit-Data-100K Download

📧 Contact

⭐ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 2

Languages