Skip to content

Conversation

danbev
Copy link
Member

@danbev danbev commented Oct 2, 2025

This commit adds support for models that use SentenceTransformer layers.

The motivation for this is that if converted model includes any of the numbered layers specified in the original models repository then these changes enable these models to be used and verified. Currently the model-conversion only support the base model output without any of the additional transformation layers.

Usage:
Convert the model that also includes the SentenceTransformer layers:

(venv) $ export EMBEDDING_MODEL_PATH="~/google/embeddinggemma-300M"
(venv) make embedding-convert-model-st

Verify the produced embeddings from the converted model against the original model embeddings:

(venv) make embedding-verify-logits-st

The original model can be run using SentenceTransformer:

(venv) make embedding-run-original-model-st

Run the converted model using "SentenceTransformer" layers whic enables pooling and normalization:

(venv) make embedding-run-converted-model-st

@github-actions github-actions bot added examples python python script changes labels Oct 2, 2025
@ggml-org ggml-org deleted a comment from tommarques56 Oct 5, 2025
@ggml-org ggml-org deleted a comment from tommarques56 Oct 5, 2025
@ggml-org ggml-org deleted a comment from tommarques56 Oct 5, 2025
@ggml-org ggml-org deleted a comment from tommarques56 Oct 5, 2025
@ggml-org ggml-org deleted a comment from tommarques56 Oct 5, 2025
@ggml-org ggml-org deleted a comment from tommarques56 Oct 5, 2025
@ggml-org ggml-org deleted a comment from tommarques56 Oct 5, 2025
@ggml-org ggml-org deleted a comment from tommarques56 Oct 5, 2025
@ggml-org ggml-org deleted a comment from tommarques56 Oct 5, 2025
@danbev danbev force-pushed the model-conversion-st-support branch from 98e2c0a to 8bbd19d Compare October 7, 2025 06:05
@danbev danbev marked this pull request as ready for review October 9, 2025 06:47
@danbev danbev requested a review from CISC as a code owner October 9, 2025 06:47
danbev added 3 commits October 9, 2025 10:42
This commit adds support for models that use SentenceTransformer layers.

The motivation for this is that if converted model includes any of the
numbered layers specified in the original models repository then these
changes enable these models to be used and verified. Currently the
model-conversion only support the base model output without any of
the additional transformation layers.

Usage:
Convert the model that also includes the SentenceTransformer layers:
```console
(venv) $ export EMBEDDING_MODEL_PATH="~/google/embeddinggemma-300M"
(venv) make embedding-convert-model
```

Verify the produced embeddings from the converted model against the
original model embeddings:
```console
(venv) make embedding-verify-logits-st
```

The original model can be run using SentenceTransformer:
```console
(venv) make embedding-run-original-model-st
```

Run the converted model using "SentenceTransformer" layers whic
enables pooling and normalization:
```console
(venv) make embedding-run-converted-model-st
```
This commit add support for the -st flag in the embedding model
conversion script. This will enable models to be converted using
sentence transformers dense layers.
@danbev danbev force-pushed the model-conversion-st-support branch from 8bbd19d to 2764aaa Compare October 9, 2025 08:42
@danbev danbev merged commit 56b4795 into ggml-org:master Oct 9, 2025
72 checks passed
anyshu pushed a commit to anyshu/llama.cpp that referenced this pull request Oct 10, 2025
* master: (113 commits)
  webui: updated the chat service to only include max_tokens in the req… (ggml-org#16489)
  cpu : optimize the ggml NORM operation (ggml-org#15953)
  server : host-memory prompt caching (ggml-org#16391)
  No markdown in cot (ggml-org#16483)
  model-conversion : add support for SentenceTransformers (ggml-org#16387)
  ci: add ARM64 Kleidiai build and test support (ggml-org#16462)
  CANN: Improve ACL graph matching (ggml-org#16166)
  kleidiai: kernel interface refactoring (ggml-org#16460)
  [SYCL] refactor soft_max, add soft_max_back (ggml-org#16472)
  model: EmbeddingGemma Adding Support for SentenceTransformers Dense Modules (ggml-org#16367)
  refactor: centralize CoT parsing in backend for streaming mode (ggml-org#16394)
  Disable CUDA host buffers on integrated GPUs (ggml-org#16308)
  server : fix cancel pending task (ggml-org#16467)
  metal : mark FA blocks (ggml-org#16372)
  server : improve context checkpoint logic (ggml-org#16440)
  ggml webgpu: profiling, CI updates, reworking of command submission (ggml-org#16452)
  llama : support LiquidAI LFM2-MoE hybrid model (ggml-org#16464)
  server : add `/v1/health` endpoint (ggml-org#16461)
  webui : added download action (ggml-org#13552) (ggml-org#16282)
  presets : fix pooling param for embedding models (ggml-org#16455)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants