GitHub - aim-uofa/Diception: [NeurIPS 2025 Spotlight] A Generalist Diffusion Model for Vision Perception

🎯 DICEPTION: A Generalist Diffusion Model for Vision Perception

📖 Project Page | 📄 Paper Link | 🤗 Huggingface Demo

One single model solves multiple perception tasks, on par with SOTA!

📰 News

2025-09-21: 🚀 Model and inference code released
2025-09-19: 🌟 Accepted as NeurIPS 2025 Spotlight
2025-02-25: 📝 Paper released

🛠️ Installation

conda create -n diception python=3.10 -y

conda activate diception

pip install -r requirements.txt

👾 Inference

⚡ Quick Start

🧩 Model Setup

Download SD3 Base Model: Download the Stable Diffusion 3 medium model from: https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers
Download Trained Weights: Please download the model from Hugging Face: https://huggingface.co/Canyu/DICEPTION
Update Paths: Set --pretrained_model_path to your SD3 path, and set --diception_path to the local path of the downloaded DICEPTION_v1.pth.
Sample JSON for Batch Inference: We provide several JSON examples for batch inference in the DATA/jsons/evaluate directory.

▶️ Option 1: Simple Inference Script

For single image inference:

python inference.py \
    --image path/to/your/image.jpg \
    --prompt "[[image2depth]]" \
    --pretrained_model_path PATH_TO_SD3 \
    --diception_path PATH_TO_DICEPTION_v1.PTH \
    --output_dir ./outputs \
    --guidance_scale 2 \
    --num_inference_steps 28

With coordinate points (for interactive segmentation):

python inference.py \
    --image path/to/your/image.jpg \
    --prompt "[[image2segmentation]]" \
    --pretrained_model_path PATH_TO_SD3 \
    --diception_path PATH_TO_DICEPTION_v1.PTH \
    --output_dir ./outputs \
    --guidance_scale 2 \
    --num_inference_steps 28 \
    --points "0.3,0.5;0.7,0.2"

The --points parameter accepts coordinates in format "y1,x1;y2,x2;y3,x3" where:

Coordinates are normalized to [0,1] range
Format is (y,x) where y=height/image_height, x=width/image_width
Multiple points are separated by semicolons
Maximum 5 points are supported

📦 Option 2: Batch Inference

For batch processing with a JSON dataset:

python batch_inference.py \
    --pretrained_model_path PATH_TO_SD3 \
    --diception_path PATH_TO_DICEPTION_v1.PTH \
    --input_path example_batch.json \
    --data_root_path ./ \
    --save_path ./batch_results \
    --batch_size 4 \
    --guidance_scale 2 \
    --num_inference_steps 28
    # --save_npy (for depth and normal value)

JSON Format for Batch Inference: The input JSON file should contain a list of tasks in the following format:

[
  {
    "input": "path/to/image1.jpg",
    "caption": "[[image2segmentation]]"
  },
  {
    "input": "path/to/image2.jpg", 
    "caption": "[[image2depth]]"
  },
  {
    "input": "path/to/image3.jpg",
    "caption": "[[image2segmentation]]",
    "target": {
      "path": "path/to/sa1b.json"   (For convenience, randomly select a region for point prompt from the GT json)
    }
  }
]

📋 Supported Tasks

DICEPTION supports various vision perception tasks:

Depth Estimation: [[image2depth]]
Surface Normal Estimation: [[image2normal]]
Pose Estimation: [[image2pose]]
Interactive Segmentation: [[image2segmentation]]
Semantic Segmentation: [[image2semantic]] + (category in coco), e.g. [[image2semantic]] person
Entity Segmentation: [[image2entity]]

💡 Inference Tips

General settings: For best overall results, use --num_inference_steps 28 and --guidance_scale 2.0.
1-step/few-step inference: We found flow-matching diffusion models naturally support few-step inference, especially for tasks like depth and surface normals. DICEPTION can run with --num_inference_steps 1 and --guidance_scale 1.0 with barely quality loss. If you prioritize speed, consider this setting. We provide a detailed analysis in our NeurIPS paper.

🗺️ Plan

Release inference code and pretrained model v1
Release training code
Release few-shot finetuning code

🎫 License

For academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.

🖊️ Citation

@article{zhao2025diception,
  title={Diception: A generalist diffusion model for visual perceptual tasks},
  author={Zhao, Canyu and Liu, Mingyu and Zheng, Huanyi and Zhu, Muzhi and Zhao, Zhiyue and Chen, Hao and He, Tong and Shen, Chunhua},
  journal={arXiv preprint arXiv:2502.17157},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
DATA		DATA
assets		assets
dataset		dataset
models		models
utils		utils
DICEPTION-intro.docx		DICEPTION-intro.docx
DICEPTION-intro.pdf		DICEPTION-intro.pdf
README.md		README.md
batch_inference.py		batch_inference.py
inference.py		inference.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎯 DICEPTION: A Generalist Diffusion Model for Vision Perception

📰 News

🛠️ Installation

👾 Inference

⚡ Quick Start

🧩 Model Setup

▶️ Option 1: Simple Inference Script

📦 Option 2: Batch Inference

📋 Supported Tasks

💡 Inference Tips

🗺️ Plan

🎫 License

🖊️ Citation

About

Uh oh!

Releases

Packages

Contributors 2

Languages

aim-uofa/Diception

Folders and files

Latest commit

History

Repository files navigation

🎯 DICEPTION: A Generalist Diffusion Model for Vision Perception

📰 News

🛠️ Installation

👾 Inference

⚡ Quick Start

🧩 Model Setup

▶️ Option 1: Simple Inference Script

📦 Option 2: Batch Inference

📋 Supported Tasks

💡 Inference Tips

🗺️ Plan

🎫 License

🖊️ Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages