Skip to content

[CVPR 2025] Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space

License

Notifications You must be signed in to change notification settings

SJTU-DeepVisionLab/HyperCLIP

Repository files navigation

Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space (CVPR2025)

Framework

🔍 Overview

HyperCLIP is a lightweight and effective fine-tuning framework built upon CLIP for open-vocabulary semantic segmentation. Motivated by the observation that segmentation requires alignment at pixel-level hierarchical granularity, this work explores fine-tuning CLIP in hyperbolic space, which shifts the hierarchical granularity of CLIP's embedding from image-level to pixel-level, thereby equipping it with segmentation capability.

Key Findings

  • Hyperbolic radius alignment via fine-tuning: The hyperbolic radius of CLIP's text embeddings decreases, showing that the text encoder shifts from image-to-text to pixel-to-text alignment.
  • Hyperbolic radius adjustment: HyperCLIP explicitly introduces hyperbolic radius adjustment for CLIP's embeddings to better align vision and language representations in hyperbolic space.
  • Parameter efficiency: Only ~4% of CLIP’s parameters are fine-tuned, yet HyperCLIP attains state-of-the-art performance across three open-vocabulary segmentation benchmarks.
  • Characteristic hyperbolic level: After fine-tuning, text embeddings converge to a stable hyperbolic radius across different datasets, suggesting that segmentation tasks correspond to a characteristic hierarchy level in hyperbolic geometry.

📊 Visualizing Hyperbolic Radius Alignment

The figure below illustrates how CLIP embeddings evolve during HyperCLIP fine-tuning:

  • Image-level semantics (large radius) → pixel-level semantics (smaller radius).

Hyperbolic Radius Alignment

Installation and Data Preparation

Please refer to the CAT-Seg repository for guidance on:

  • Environment setup (Python version, dependencies, etc.)
  • Dataset preparation (e.g., COCO, ADE20K, Pascal VOC)

Training and Evaluation

You can launch the entire training and evaluation pipeline using:

bash run_train_test.sh

Acknowledgement

Thanks to the excellent works and their codebases of CAT-Seg.

Citation

Please consider citing our paper if the code is helpful in your research and development.

@inproceedings{peng2025understanding,
  title={Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space},
  author={Peng, Zelin and Xu, Zhengqin and Zeng, Zhilin and Wen, Changsong and Huang, Yu and Yang, Menglin and Tang, Feilong and Shen, Wei},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={4562--4572},
  year={2025}
}

About

[CVPR 2025] Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published