From 414032ec0e2fbbae5502a797a690bfc2f74bec74 Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Wed, 4 Jun 2025 12:51:57 +0000 Subject: [PATCH] [Doc] Update V1 Guide for embedding models Signed-off-by: DarkLight1337 --- docs/usage/v1_guide.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/usage/v1_guide.md b/docs/usage/v1_guide.md index 7c4909cb5d91..baeb5411bcfd 100644 --- a/docs/usage/v1_guide.md +++ b/docs/usage/v1_guide.md @@ -55,7 +55,7 @@ This living user guide outlines a few known **important changes and limitations* | **Spec Decode** | 🚧 WIP ([PR #13933](https://github.com/vllm-project/vllm/pull/13933))| | **Prompt Logprobs with Prefix Caching** | 🟡 Planned ([RFC #13414](https://github.com/vllm-project/vllm/issues/13414))| | **Structured Output Alternative Backends** | 🟡 Planned | -| **Embedding Models** | 🚧 WIP ([PR #18015](https://github.com/vllm-project/vllm/pull/18015)) | +| **Embedding Models** | 🚧 WIP ([PR #16188](https://github.com/vllm-project/vllm/pull/16188)) | | **Mamba Models** | 🟡 Planned | | **Encoder-Decoder Models** | 🟠 Delayed | | **Request-level Structured Output Backend** | 🔴 Deprecated | @@ -145,9 +145,9 @@ vLLM V1 currently excludes model architectures with the `SupportsV0Only` protoco and the majority fall into the following categories. V1 support for these models will be added eventually. **Embedding Models** -Initially, we will create a [separate model runner](https://github.com/vllm-project/vllm/pull/18015) to provide V1 support without conflicting with other ongoing work. +The initial support will be provided by [PR #16188](https://github.com/vllm-project/vllm/pull/16188). -Later, we will consider using [hidden states processor](https://github.com/vllm-project/vllm/issues/12249), which is based on [global logits processor](https://github.com/vllm-project/vllm/pull/13360) to enable simultaneous generation and embedding using the same engine instance in V1. [PR #16188](https://github.com/vllm-project/vllm/pull/16188) is the first step towards enabling this. +Later, we will consider using [hidden states processor](https://github.com/vllm-project/vllm/issues/12249), which is based on [global logits processor](https://github.com/vllm-project/vllm/pull/13360) to enable simultaneous generation and embedding using the same engine instance in V1. **Mamba Models** Models using selective state-space mechanisms (instead of standard transformer attention)