Skip to content

guojiajeremy/Dinomaly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection

CVPR 2025

arxiv | cvpr

PyTorch Implementation of CVPR 2025 "Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection".

The first multi-class UAD model that can compete with single-class SOTAs !!!

Give me a ⭐️ if you like it.

fig1

News

  • 05.2024: Arxiv preprint and github code released🚀

  • 09.2024: Rejected by NeurIPS 2024 with 5 positive scores and no negative score, because "AC: lack of novelty"😭. Wish me good luck.

  • 02.2025: Accepted by CVPR 2025🎉

  • 07.2025: Spoil alert: We will come back with Dinomly-2😛

  • 07.2025: Dinomaly has been integrated in Intel open-edge Anomalib in v2.1.0. Great thanks to the contributors for the nice reproduction and integration. Anomalib is a comprehensive library for benchmarking, developing and deploying deep learning anomaly detection algorithms.

Abstract

Recent studies highlighted a practical setting of unsupervised anomaly detection (UAD) that builds a unified model for multi-class images. Despite various advancements addressing this challenging task, the detection performance under the multi-class setting still lags far behind state-of-the-art class-separated models. Our research aims to bridge this substantial performance gap. In this paper, we introduce a minimalistic reconstruction-based anomaly detection framework, namely Dinomaly, which leverages pure Transformer architectures without relying on complex designs, additional modules, or specialized tricks. Given this powerful framework consisted of only Attentions and MLPs, we found four simple components that are essential to multi-class anomaly detection: (1) Foundation Transformers that extracts universal and discriminative features, (2) Noisy Bottleneck where pre-existing Dropouts do all the noise injection tricks, (3) Linear Attention that naturally cannot focus, and (4) Loose Reconstruction that does not force layer-to-layer and point-by-point reconstruction. Extensive experiments are conducted across popular anomaly detection benchmarks including MVTec-AD, VisA, and Real-IAD. Our proposed Dinomaly achieves impressive image-level AUROC of 99.6%, 98.7%, and 89.3% on the three datasets respectively (99.8%, 98.9%, 90.1% with ViT-L), which is not only superior to state-of-the-art multi-class UAD methods, but also achieves the most advanced class-separated UAD records.

1. Environments

Create a new conda environment and install required packages.

conda create -n my_env python=3.8.12
conda activate my_env
pip install -r requirements.txt

Experiments are conducted on NVIDIA GeForce RTX 3090 (24GB). Same GPU and package version are recommended.

2. Prepare Datasets

Noted that ../ is the upper directory of Dinomaly code. It is where we keep all the datasets by default. You can also alter it according to your need, just remember to modify the data_path in the code.

MVTec AD

Download the MVTec-AD dataset from URL. Unzip the file to ../mvtec_anomaly_detection.

|-- mvtec_anomaly_detection
    |-- bottle
    |-- cable
    |-- capsule
    |-- ....

VisA

Download the VisA dataset from URL. Unzip the file to ../VisA/. Preprocess the dataset to ../VisA_pytorch/ in 1-class mode by their official splitting code.

You can also run the following command for preprocess, which is the same to their official code.

python ./prepare_data/prepare_visa.py --split-type 1cls --data-folder ../VisA --save-folder ../VisA_pytorch --split-file ./prepare_data/split_csv/1cls.csv

../VisA_pytorch will be like:

|-- VisA_pytorch
    |-- 1cls
        |-- candle
            |-- ground_truth
            |-- test
                    |-- good
                    |-- bad
            |-- train
                    |-- good
        |-- capsules
        |-- ....

Real-IAD

Contact the authors of Real-IAD URL to get the net disk link.

Download and unzip realiad_1024 and realiad_jsons in ../Real-IAD. ../Real-IAD will be like:

|-- Real-IAD
    |-- realiad_1024
        |-- audiokack
        |-- bottle_cap
        |-- ....
    |-- realiad_jsons
        |-- realiad_jsons
        |-- realiad_jsons_sv
        |-- realiad_jsons_fuiad_0.0
        |-- ....

3. Run Experiments

Multi-Class Setting

python dinomaly_mvtec_uni.py --data_path ../mvtec_anomaly_detection
python dinomaly_visa_uni.py --data_path ../VisA_pytorch/1cls
python dinomaly_realiad_uni.py --data_path ../Real-IAD

Conventional Class-Separted Setting

python dinomaly_mvtec_sep.py --data_path ../mvtec_anomaly_detection
python dinomaly_visa_sep.py --data_path ../VisA_pytorch/1cls
python dinomaly_realiad_sep.py --data_path ../Real-IAD

Training Unstability: The optimization can be unstable with loss spikes (e.g. ...0.05, 0.04, 0.04, 0.32, 0.23, 0.08...) , which can be harmful to performance. This occurs very very rare. If you see such loss spikes during training, consider change a random seed.

Results

A. Compare with MUAD SOTAs:

image image

Dinomaly can perfectly scale with model size, input image size, and the choice of foundation model.

B. Model Size:

image image image

C. Input Size:

image image

D. Choice of Foundaiton Model:

image image

Eval discrepancy of anomaly localization

In our code implementation, we binarize the GT mask using gt.bool() after down-sampling, specifically gt[gt>0]=1. As pointed out in an issue, the previous common practice is to use gt[gt>0.5]=1. The difference between these two binarization approaches is that gt[gt>0]=1 may result in anomaly regions being one pixel larger compared to gt[gt>0.5]=1. This difference does not affect image-level performance metrics, but it has a slight impact on pixel-level evaluation metrics.

We think gt[gt>0]=1 is a more reasonable choice. It can be seen as max pooling, so that in the down-sampled GT map, any position that corresponds to a region containing at least one anomaly pixel in the original map is marked as anomalous. If an anomaly region is extremely small in the original image (say 2 pixels), gt[gt>0.5]=1 will erase it while gt[gt>0]=1 can keep it.

Citation

@inproceedings{guo2025dinomaly,
  title={Dinomaly: The less is more philosophy in multi-class unsupervised anomaly detection},
  author={Guo, Jia and Lu, Shuai and Zhang, Weihang and Chen, Fang and Li, Huiqi and Liao, Hongen},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={20405--20415},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages