Skip to content

[ICCVW 2025 accepted] Simplifying Traffic Anomaly Detection with Video Foundation Models

License

tue-mps/simple-tad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simplifying Traffic Anomaly Detection with Video Foundation Models

Svetlana Orlova, Tommie Kerssies, Brunó B. Englert, Gijs Dubbelman
Eindhoven University of Technology

arXiv Hugging Face Models

Recent methods for ego-centric Traffic Anomaly Detection (TAD) often rely on complex multi-stage or multi-representation fusion architectures, yet it remains unclear whether such complexity is necessary. Recent findings in visual perception suggest that foundation models, enabled by advanced pre-training, allow simple yet flexible architectures to outperform specialized designs. Therefore, in this work, we investigate an architecturally simple encoder-only approach using plain Video Vision Transformers (Video ViTs) and study how pre-training enables strong TAD performance. We find that: (i) advanced pre-training enables simple encoder-only models to match or even surpass the performance of specialized state-of-the-art TAD methods, while also being significantly more efficient; (ii) although weakly- and fully-supervised pre-training are advantageous on standard benchmarks, we find them less effective for TAD. Instead, self-supervised Masked Video Modeling (MVM) provides the strongest signal; and (iii) Domain-Adaptive Pre-Training (DAPT) on unlabeled driving videos further improves downstream performance, without requiring anomalous examples. Our findings highlight the importance of pre-training and show that effective, efficient, and scalable TAD models can be built with minimal architectural complexity.

Simple_Main

✨ DoTA and DADA-2000 results

Simple_Results

Video ViT-based encoder-only models set a new state of the art on both datasets, while being significantly more efficient than top-performing specialized methods. FPS measured using NVIDIA A100 MIG, 2 1 GPU. † From prior work. ‡ Optimistic estimates using publicly available components of the model. “A→B”: trained on A, tested on B; D2K: DADA-2000.

📍Model Zoo

We provide pre-trained and fine-tuned models in MODEL_ZOO.md.

🔨 Installation

Please follow the instructions in INSTALL.md.

🗄️ Data Preparation

Please follow the instructions in DATASET.md for data preparation.

🔄 Domain-Adaptive Pre-training (DAPT) and Fine-tuning

Please follow the instructions in TRAIN.md.

🚀 Evaluation and Inference

Instructions are in RUN.md.

☎️ Contact

Svetlana Orlova: [email protected], [email protected]

👍 Acknowledgements

Our code is mainly based on the VideoMAE codebase. With Video ViTs that have identical architecture, we only used their weights: ViViT, VideoMAE2, SMILE, SIGMA, MME, MGMAE.
We used fragments of original implementations of MVD, InternVideo2, and UMT to integrate these models with our codebase.

🔒 License

The majority of this project is released under the CC-BY-NC 4.0 license as found in the LICENSE file. Portions of the project are available under separate license terms: ViViT, InternVideo2, SlowFast and pytorch-image-models are licensed under the Apache 2.0 license. VideoMAE2, SMILE, MGMAE, UMT, and BEiT are licensed under the MIT license. SIGMA is licensed under the BSD 3-Clause Clear license

✏️ Citation

If you think this project is helpful, please feel free to leave a star⭐️ and cite our paper:

@inproceedings{orlova2025simplifying,
  title={Simplifying Traffic Anomaly Detection with Video Foundation Models},
  author={Orlova, Svetlana and Kerssies, Tommie and Englert, Brun{\'o} B and Dubbelman, Gijs},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

@article{orlova2025simplifying,
  title={Simplifying Traffic Anomaly Detection with Video Foundation Models},
  author={Orlova, Svetlana and Kerssies, Tommie and Englert, Brun{\'o} B and Dubbelman, Gijs},
  journal={arXiv preprint arXiv:2507.09338},
  year={2025}
}

About

[ICCVW 2025 accepted] Simplifying Traffic Anomaly Detection with Video Foundation Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published