Skip to content

Conversation

@bbartels
Copy link
Contributor

@bbartels bbartels commented Oct 31, 2025

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new endpoint to support Anthropic's Messages API, which is a valuable addition. The implementation involves creating a new API endpoint and a serving layer to translate between Anthropic and OpenAI protocols, along with corresponding tests. My review has identified a few critical and high-severity issues that should be addressed. These include a bug in the tests that prevents a test case from running, a potential server crash due to a null reference in the new API endpoint, incorrect error response formatting, and missing error handling. I've provided detailed comments and suggestions for each of these points.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@bbartels bbartels changed the title Adds anthropic messages endpoint to openai server Adds anthropic /v1/messages endpoint to openai api_server Oct 31, 2025
Signed-off-by: bbartels <[email protected]>
Signed-off-by: bbartels <[email protected]>
Signed-off-by: bbartels <[email protected]>
Signed-off-by: bbartels <[email protected]>
Signed-off-by: bbartels <[email protected]>
Signed-off-by: bbartels <[email protected]>
Signed-off-by: bbartels <[email protected]>
@bbartels
Copy link
Contributor Author

/gemini review

Signed-off-by: bbartels <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the Anthropic /v1/messages endpoint to the OpenAI-compatible API server. The implementation cleverly reuses the existing chat completion logic by creating an adapter layer that translates requests and responses between Anthropic and OpenAI formats. The changes to the API server and the addition of the serving logic are well-structured. However, the newly added test file introduces significant code duplication, which could pose maintenance challenges. I've provided a suggestion to refactor the tests for better maintainability.

@bbartels
Copy link
Contributor Author

Ready for review, not sure if we should just get rid of the separate anthropic api_server as this doesn't add much over just having the openai one support anthropic's /v1/messages endpoint

Signed-off-by: bbartels <[email protected]>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is OK for now, but in the long term, the code becomes less maintainable as more APIs need to be supported. It is not ideal that the "OpenAI" server includes code from other APIs.

After this PR, I propose restructuring the code for online serving into the following modules:

  • vllm.entrypoints.serve.core: Contains the code for setting up the async client and FastAPI app. May include some common APIs such as health check.
  • vllm.entrypoints.serve.openai: Contains only the code for OpenAI endpoints (e.g. Completions API, Chat Completions API, Responses API)
  • vllm.entrypoints.serve.anthropic: Contains only the code for Anthropic endpoints (Messages API)
  • vllm.entrypoints.serve.jina: Contains only the code for JinaAI endpoints (Reranker API)
  • vllm.entrypoints.serve.vllm: Contains only the code for vLLM endpoints (e.g. Tokenize API, Pooling API, dev mode endpoints)

In vllm.entrypoints.serve, we can have the actual entrypoint which uses .core to build the server, then incrementally attach endpoints to the FastAPI app by importing relevant functions from the API-specific submodules.

@mgoin
Copy link
Member

mgoin commented Nov 1, 2025

I agree with @DarkLight1337, especially as we need to add other endpoints from the Anthropic api in order to work well with Claude Code such as /v1/messages/count_tokens

vllm serve should still be the main interface to access all the different endpoints, but we are long overdue restructuring the code to identify what endpoints are mention to match the OpenAI standard and what are alternatives.

For now, I think we should land this addition and delete the explicit anthropic serve command and code. That code has only been in main for a week, so better to delete it quickly than introduce it into a release just to deprecate. Then open a PR/issue to restructure the code afterwards.



@pytest.mark.asyncio
async def test_anthropic_tool_call_streaming(client: anthropic.AsyncAnthropic):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case let's not copy the tests like this to avoid duplication in CI. We just need to check that the endpoint exists for OpenAI server since the same code is used to process the endpoint

@bbartels
Copy link
Contributor Author

bbartels commented Nov 1, 2025

In full agreement with you both :) Hence why i was rushing to make this change, seems more sustainable in the long run than to maintain two separate api servers.

@bbartels
Copy link
Contributor Author

bbartels commented Nov 1, 2025

@mgoin @DarkLight1337 I have removed the old api server now, including the associated tests

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) November 1, 2025 13:22
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 1, 2025
auto-merge was automatically disabled November 1, 2025 16:41

Head branch was pushed to by a user without write access

@simon-mo simon-mo merged commit 1e88fb7 into vllm-project:main Nov 1, 2025
48 checks passed
zhaozuy pushed a commit to zhaozuy/vllm that referenced this pull request Nov 4, 2025
@chaunceyjiang
Copy link
Collaborator

I think this change is OK for now, but in the long term, the code becomes less maintainable as more APIs need to be supported. It is not ideal that the "OpenAI" server includes code from other APIs.

After this PR, I propose restructuring the code for online serving into the following modules:

  • vllm.entrypoints.serve.core: Contains the code for setting up the async client and FastAPI app. May include some common APIs such as health check.
  • vllm.entrypoints.serve.openai: Contains only the code for OpenAI endpoints (e.g. Completions API, Chat Completions API, Responses API)
  • vllm.entrypoints.serve.anthropic: Contains only the code for Anthropic endpoints (Messages API)
  • vllm.entrypoints.serve.jina: Contains only the code for JinaAI endpoints (Reranker API)
  • vllm.entrypoints.serve.vllm: Contains only the code for vLLM endpoints (e.g. Tokenize API, Pooling API, dev mode endpoints)

In vllm.entrypoints.serve, we can have the actual entrypoint which uses .core to build the server, then incrementally attach endpoints to the FastAPI app by importing relevant functions from the API-specific submodules.

@DarkLight1337 I completely agree — I’ll give the refactor a try. #28040

juliendenize pushed a commit to juliendenize/vllm that referenced this pull request Nov 6, 2025
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants