[Model] Siglip Embedding Support #27324

piood · 2025-10-22T08:32:39Z

Purpose

Support SigLIP text and image embedding in the same model, following the same architecture as CLIP embedding support.

For text inputs, we only apply token_embedding when calling get_input_embeddings. The rest of the text embedding and the encoder logic are applied when calling forward on the model.
For image inputs, we apply vision embeddings when calling get_input_embeddings. Since the model doesn't have a decoder, we directly return the embeddings inside the forward method.
Unlike CLIP, SigLIP uses an encoder-only architecture. To disable prefix_cache for SigLIP, as mentioned in [Model] Siglip Embedding Support #27324 (comment), use CLS as the default pooling type.

This PR extends the multimodal embedding capabilities to support SigLIP models, which are widely used for vision-language tasks.

Related to #13663，this pr is for siglip(v1) embedding support, then will continue support siglip2.

Test Plan

Added dedicated tests in tests/models/multimodal/pooling/test_siglip.py
Updated model registry to include SigLIP embedding support
Added examples for both offline inference and online serving
Verified with local test runs

Test Result

All new SigLIP-specific tests pass
Model registry correctly recognizes SigLIP embedding models
Both text and image embedding generation work as expected

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2025-10-22T08:33:14Z

Documentation preview: https://vllm--27324.org.readthedocs.build/en/27324/

mergify · 2025-10-22T08:33:18Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @piood.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request adds support for SigLIP text and image embedding models, which is a great extension of vLLM's multimodal capabilities. The implementation follows the existing architecture for CLIP, including separate handling of text and image inputs. The changes are well-structured, with new tests, examples, and updates to the model registry. I have one suggestion regarding the pooling mechanism to improve performance and memory efficiency.

vllm/model_executor/models/siglip.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/model_executor/models/siglip.py

docs/models/supported_models.md

vllm/model_executor/models/siglip.py

DarkLight1337 · 2025-10-22T10:02:44Z

vllm/v1/core/sched/scheduler.py

            kv_cache_config=kv_cache_config,
            max_model_len=self.max_model_len,
-            enable_caching=self.cache_config.enable_prefix_caching,
+            enable_caching=enable_caching,


@noooop shouldn't enable prefix caching be disabled for encoder-only models already? Why do we still need this?

noooop

Thanks for your contribution

vllm/v1/core/sched/scheduler.py

piood

Fixed get_num_image_tokens with detailed documentation. All other issues have been addressed. Ready for re-review.

vllm/model_executor/models/siglip.py

docs/models/supported_models.md

vllm/model_executor/models/siglip.py

vllm/v1/core/sched/scheduler.py

vllm/model_executor/models/siglip.py

DarkLight1337 · 2025-10-23T14:48:32Z

/gemini review

gemini-code-assist

Code Review

This pull request adds support for SigLIP text and image embedding models. The changes include adding the model to the registry, providing example usage scripts, and implementing the model logic in vllm/model_executor/models/siglip.py. The implementation correctly handles separate text and image inputs and reuses encoder components for both modalities. The tests are comprehensive. I have two main concerns:

A critical performance issue in the vision tower's pooling head, which uses a non-optimized torch.nn.MultiheadAttention instead of a vLLM-optimized attention backend.
A bug where the model will crash if pooling_type='LAST' is used, due to a missing entry in the pooling strategy map. This contradicts the behavior described in the PR description.
These issues should be addressed before merging.

vllm/model_executor/models/siglip.py

Signed-off-by: piood <[email protected]>

examples/offline_inference/vision_language_pooling.py

Signed-off-by: piood <[email protected]>

tests/models/registry.py

Signed-off-by: piood <[email protected]>

Signed-off-by: piood <[email protected]> Signed-off-by: 0xrushi <[email protected]>

Signed-off-by: piood <[email protected]>

piood requested review from ApostaC, DarkLight1337, WoosukKwon, alexm-redhat, comaniac, heheda12345, njhill, noooop, robertgshaw2-redhat and ywang96 as code owners October 22, 2025 08:32

mergify bot added documentation Improvements or additions to documentation multi-modality Related to multi-modality (#4194) new-model Requests to new models v1 labels Oct 22, 2025

mergify bot added the needs-rebase label Oct 22, 2025

gemini-code-assist bot reviewed Oct 22, 2025

View reviewed changes

vllm/model_executor/models/siglip.py Outdated Show resolved Hide resolved

piood mentioned this pull request Oct 22, 2025

[New Model]: Google SigLip 2 #13663

Closed

1 task

chatgpt-codex-connector bot reviewed Oct 22, 2025

View reviewed changes

vllm/model_executor/models/siglip.py Show resolved Hide resolved

noooop reviewed Oct 22, 2025

View reviewed changes

vllm/model_executor/models/siglip.py Show resolved Hide resolved

DarkLight1337 reviewed Oct 22, 2025

View reviewed changes

docs/models/supported_models.md Outdated Show resolved Hide resolved

DarkLight1337 reviewed Oct 22, 2025

View reviewed changes

vllm/model_executor/models/siglip.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Oct 22, 2025

View reviewed changes

noooop reviewed Oct 22, 2025

View reviewed changes

vllm/v1/core/sched/scheduler.py Outdated Show resolved Hide resolved

piood commented Oct 23, 2025

View reviewed changes

noooop reviewed Oct 23, 2025

View reviewed changes

vllm/model_executor/models/siglip.py Show resolved Hide resolved

gemini-code-assist bot reviewed Oct 23, 2025

View reviewed changes

vllm/model_executor/models/siglip.py Show resolved Hide resolved

vllm/model_executor/models/siglip.py Show resolved Hide resolved

piood added 4 commits October 23, 2025 16:29

fix siglip is_causal error

3e0795a

Signed-off-by: piood <[email protected]>

remove scheduler change

a6db13c

Signed-off-by: piood <[email protected]>

change siglip use class Strategy

09f913e

Signed-off-by: piood <[email protected]>

fix siglip CLS

940ce7d

Signed-off-by: piood <[email protected]>

piood force-pushed the support-siglip-emb branch from 3b1e7ea to 940ce7d Compare October 23, 2025 16:31

mergify bot removed the needs-rebase label Oct 23, 2025

DarkLight1337 reviewed Oct 23, 2025

View reviewed changes

examples/offline_inference/vision_language_pooling.py Outdated Show resolved Hide resolved

keep alphabetical order and fix test

46ce3df

Signed-off-by: piood <[email protected]>

DarkLight1337 reviewed Oct 23, 2025

View reviewed changes

tests/models/registry.py Outdated Show resolved Hide resolved

fix order

b9485c7

Signed-off-by: piood <[email protected]>

DarkLight1337 approved these changes Oct 23, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) October 23, 2025 17:04

remove encoder, add token_embed

6f6d5c4

Signed-off-by: piood <[email protected]>

auto-merge was automatically disabled October 23, 2025 17:14
Head branch was pushed to by a user without write access

DarkLight1337 enabled auto-merge (squash) October 23, 2025 17:17

DarkLight1337 merged commit 0552cfb into vllm-project:main Oct 23, 2025
55 checks passed

Kay-Tian mentioned this pull request Oct 24, 2025

vLLM PR #27324 变更核心文件提醒 Kay-Tian/vllm#35

Closed

piood mentioned this pull request Oct 24, 2025

[Docs] remove v1 column for embedding models #27446

Merged

3 tasks

kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025

[Model] Siglip Embedding Support (vllm-project#27324)

6ca229b

Signed-off-by: piood <[email protected]>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Model] Siglip Embedding Support (vllm-project#27324)

c0ea035

Signed-off-by: piood <[email protected]> Signed-off-by: 0xrushi <[email protected]>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Model] Siglip Embedding Support (vllm-project#27324)

f505bb0

Signed-off-by: piood <[email protected]> Signed-off-by: 0xrushi <[email protected]>

piood mentioned this pull request Oct 27, 2025

[Model] Siglip2 Model Support #27566

Merged

5 tasks

DarkLight1337 mentioned this pull request Oct 27, 2025

feat(multimodal): Add support for SigLIP pooling model #22921

Closed

piood changed the title ~~[Model] Siglip Embedding Support~~ [Model] Siglip2 Embedding Support Oct 27, 2025

piood changed the title ~~[Model] Siglip2 Embedding Support~~ [Model] Siglip Embedding Support Oct 27, 2025

This was referenced Oct 27, 2025

[New Model]: HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit #21087

Closed

[Feature]: Directly support inference for ViT (Vision Transformer) #21886

Closed

noooop mentioned this pull request Oct 31, 2025

[Model] CLIP Embedding Support #26010

Merged

7 tasks

ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025

[Model] Siglip Embedding Support (vllm-project#27324)

452ad23

Signed-off-by: piood <[email protected]>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Model] Siglip Embedding Support (vllm-project#27324)

ce23fda

Signed-off-by: piood <[email protected]>

Uh oh!

[Model] Siglip Embedding Support #27324

[Model] Siglip Embedding Support #27324

Uh oh!

Conversation

piood commented Oct 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Oct 22, 2025

Uh oh!

mergify bot commented Oct 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

noooop left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

piood left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Oct 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

piood commented Oct 22, 2025 •

edited by github-actions bot

Loading

DarkLight1337 Oct 22, 2025 •

edited

Loading

piood left a comment •

edited

Loading