By Jinglong Wang, Xiawei Li, Jing zhang, Qiangyuan Xu, Qin Zhou, Qian Yu, Sheng Lu, Dong Xu.
This repository is an official implementation of the paper Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter. And you're welcome to our project page.
We are thrilled to announce our latest paper "ELBO-T2IAlign: A Generic ELBO-Based Method for Calibrating Pixel-level Text-Image Alignment in Diffusion Models", which explores the impact of original diffusion loss function. This work builds upon this repo and offers new insights into downstream tasks of diffusion models. Check it out here and explore how it enhances our understanding of diffusion model.
If you find DiffSegmenter useful in your research, please consider citing:
@misc{wang2023diffusion,
title={Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter},
author={Jinglong Wang and Xiawei Li and Jing Zhang and Qingyuan Xu and Qin Zhou and Qian Yu and Lu Sheng and Dong Xu},
year={2023},
eprint={2309.02773},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
-
Linux, CUDA>=11.7, GCC>=9.4
-
Python>=3.8
We recommend you to use Anaconda to create a conda environment:
conda create -n ldm python=3.8
Then, activate the environment:
conda activate ldm
-
Other requirements
pip install -r requirements.txt
Please download datasets and organize them as following:
├── COCO2014
│ ├── annotations
│ ├── coco_seg_anno
│ ├── images
│ │ ├── test2014
│ │ ├── train2014
│ │ └── val2014
│ └── mask
│ ├── train2014
│ └── val2014
└── VOCdevkit
├── VOC2010
│ ├── Annotations
│ ├── ImageSets
│ │ ├── Action
│ │ ├── Layout
│ │ ├── Main
│ │ ├── Segmentation
│ │ └── SegmentationContext
│ ├── JPEGImages
│ ├── SegmentationClass
│ ├── SegmentationClassContext
│ └── SegmentationObject
└── VOC2012
├── Annotations
├── ImageSets
│ ├── Action
│ ├── Layout
│ ├── Main
│ └── Segmentation
├── JPEGImages
├── SegmentationClass
├── SegmentationClassAug
└── SegmentationObject
For the setting of Open Vocabulary Semantic Segmentation, our model does not require training; it directly produces segmentation results.
The ‘open_vocabulary’ folder contains code for open vocabulary semantic segmentation. It includes scripts for the voc, coco, and Pascal context datasets.
Taking the voc10 dataset as an example:
Step 1: Modify your dataset path in the Python file.
Step 2: Run ptp_stable_voc10.py to generate segmentation results.
python ptp_stable_voc10.py
Step 3: Run the evaluation script, remember to update the file path. MIoU will be recorded in eval.txt
python evaluation_voc10.py