Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space （CVPR2025）

🔍 Overview

HyperCLIP is a lightweight and effective fine-tuning framework built upon CLIP for open-vocabulary semantic segmentation. Motivated by the observation that segmentation requires alignment at pixel-level hierarchical granularity, this work explores fine-tuning CLIP in hyperbolic space, which shifts the hierarchical granularity of CLIP's embedding from image-level to pixel-level, thereby equipping it with segmentation capability.

Key Findings

Hyperbolic radius alignment via fine-tuning: The hyperbolic radius of CLIP's text embeddings decreases, showing that the text encoder shifts from image-to-text to pixel-to-text alignment.
Hyperbolic radius adjustment: HyperCLIP explicitly introduces hyperbolic radius adjustment for CLIP's embeddings to better align vision and language representations in hyperbolic space.
Parameter efficiency: Only ~4% of CLIP’s parameters are fine-tuned, yet HyperCLIP attains state-of-the-art performance across three open-vocabulary segmentation benchmarks.
Characteristic hyperbolic level: After fine-tuning, text embeddings converge to a stable hyperbolic radius across different datasets, suggesting that segmentation tasks correspond to a characteristic hierarchy level in hyperbolic geometry.

📊 Visualizing Hyperbolic Radius Alignment

The figure below illustrates how CLIP embeddings evolve during HyperCLIP fine-tuning:

Image-level semantics (large radius) → pixel-level semantics (smaller radius).

Installation and Data Preparation

Please refer to the CAT-Seg repository for guidance on:

Environment setup (Python version, dependencies, etc.)
Dataset preparation (e.g., COCO, ADE20K, Pascal VOC)

Training and Evaluation

You can launch the entire training and evaluation pipeline using:

bash run_train_test.sh

Acknowledgement

Thanks to the excellent works and their codebases of CAT-Seg.

Citation

Please consider citing our paper if the code is helpful in your research and development.

@inproceedings{peng2025understanding,
  title={Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space},
  author={Peng, Zelin and Xu, Zhengqin and Zeng, Zhilin and Wen, Changsong and Huang, Yu and Yang, Menglin and Tang, Feilong and Shen, Wei},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={4562--4572},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
cat_seg		cat_seg
configs		configs
datasets		datasets
output		output
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
eval.sh		eval.sh
framework.jpg		framework.jpg
hyper_radius_alignment.png		hyper_radius_alignment.png
plain_train_net.py		plain_train_net.py
requirements.txt		requirements.txt
run.sh		run.sh
run_train_test.sh		run_train_test.sh
train_net.py		train_net.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space （CVPR2025）

🔍 Overview

Key Findings

📊 Visualizing Hyperbolic Radius Alignment

Installation and Data Preparation

Training and Evaluation

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

SJTU-DeepVisionLab/HyperCLIP

Folders and files

Latest commit

History

Repository files navigation

Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space （CVPR2025）

🔍 Overview

Key Findings

📊 Visualizing Hyperbolic Radius Alignment

Installation and Data Preparation

Training and Evaluation

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages