-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
Support Anthropic API /v1/messages Endpoint #22627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for the Anthropic Messages API by adding a new API server, protocol definitions, and a serving layer for format conversion. The implementation is based on the existing OpenAI-compatible server. My review has identified several critical and high-severity issues, including a potential NoneType access error, incorrect Pydantic model usage that could lead to validation errors, a risk of generating duplicate tool call IDs, and another case of incorrect attribute access on a Pydantic model that would cause a runtime error. I have provided specific code suggestions to address these issues and ensure the stability and correctness of the new endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When handler is None, messages(raw_request) is also None. Calling create_error_response on a None object will raise an AttributeError, causing an unhandled exception and a 500 server error. You should construct an ErrorResponse directly to ensure a proper error is returned. You will need to import ErrorResponse from vllm.entrypoints.openai.protocol and HTTPStatus from http.
| return messages(raw_request).create_error_response( | |
| message="The model does not support Chat Completions API") | |
| return ErrorResponse(message="The model does not support Chat Completions API", | |
| type="model_not_found", | |
| code=HTTPStatus.NOT_FOUND.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anthropic_request.tool_choice is a Pydantic model instance, not a dictionary. Accessing its attributes should be done with dot notation (e.g., .name). Using .get("name") will result in an AttributeError at runtime.
| "name": anthropic_request.tool_choice.get("name") | |
| "name": anthropic_request.tool_choice.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The id field is defined as a required field. The model_post_init method, which attempts to set a default value, is called after Pydantic's validation. If id is not provided during initialization, a ValidationError will be raised before model_post_init can execute. To correctly provide a default value for an optional field, you should use default_factory in the field definition and remove the model_post_init method.
| id: str | |
| id: str = Field(default_factory=lambda: f"msg_{int(time.time() * 1000)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using int(time.time()) to generate tool call IDs is not safe as it can produce duplicate IDs for tool calls created in the same second. This can lead to incorrect behavior when matching tool calls to their results. It's better to use a UUID-based approach for uniqueness. You can use random_tool_call_id from vllm.entrypoints.chat_utils for this, which needs to be imported.
| "id": block.id or f"call_{int(time.time())}", | |
| "id": block.id or random_tool_call_id(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A text AnthropicContentBlock is created even if generator.choices[0].message.content is None. This can lead to an invalid content block, as the Anthropic API requires the text field for text blocks. When serialized with exclude_none=True, this would result in an invalid content block. You should only create the text content block if there is content available.
| content: List[AnthropicContentBlock] = [ | |
| AnthropicContentBlock( | |
| type="text", | |
| text=generator.choices[0].message.content | |
| ) | |
| ] | |
| content: List[AnthropicContentBlock] = [] | |
| if generator.choices[0].message.content: | |
| content.append( | |
| AnthropicContentBlock( | |
| type="text", | |
| text=generator.choices[0].message.content)) | |
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
Exciting! Make sure to add some unit tests before ready and I wonder if we can also include a smoke test that claude code/some other application can communicate with the API correctly |
|
Thanks @LiuLi1998! Would you also be willing to help with ongoing support/maintenance of the API? |
Thanks for the input! I agree — I’ll add some tests soon to make sure everything works as expected. |
Definitely! I’m glad to take part in the support/maintenance of the API |
Signed-off-by: liuli <[email protected]>
Signed-off-by: liuli <[email protected]>
4465c0f to
82851a3
Compare
Signed-off-by: liuli <[email protected]>
I've added initial tests.I'm not entirely sure if the current approach follows best practices or covers everything needed—would really appreciate your feedback on improvements or any other cases |
Signed-off-by: liuli <[email protected]>
Signed-off-by: liuli <[email protected]>
Signed-off-by: liuli <[email protected]>
|
This pull request has merge conflicts that must be resolved before it can be |
# Conflicts: # tests/utils.py
|
@mgoin While adding tests, I triggered the CI and encountered the following error: |
Signed-off-by: liuli <[email protected]>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: liuli <[email protected]>
Head branch was pushed to by a user without write access
Signed-off-by: liuli <[email protected]>
fa40146 to
904fa56
Compare
Already fixed, thx for help |
Signed-off-by: liuli <[email protected]>
Signed-off-by: liuli <[email protected]>
Signed-off-by: liuli <[email protected]>
Signed-off-by: liuli <[email protected]>
Signed-off-by: liuli <[email protected]> Co-authored-by: liuli <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: jorgentrondsen <[email protected]>
Signed-off-by: liuli <[email protected]> Co-authored-by: liuli <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Michael Goin <[email protected]>
Signed-off-by: liuli <[email protected]> Co-authored-by: liuli <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Alberto Perdomo <[email protected]>
|
how to support both openai and anthropic api |
@LiuLi1998 +1, is it possible? I also want it |
Signed-off-by: liuli <[email protected]> Co-authored-by: liuli <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Michael Goin <[email protected]>
Signed-off-by: liuli <[email protected]> Co-authored-by: liuli <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: 0xrushi <[email protected]>
Signed-off-by: liuli <[email protected]> Co-authored-by: liuli <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: 0xrushi <[email protected]>
Currently, the OpenAI and Anthropic APIs are separate api servers and cannot be used at the same time. |
Is there a plan to support them simultaneously? |
| See https://docs.anthropic.com/en/api/messages | ||
| for the API specification. This API mimics the Anthropic messages API. | ||
| """ | ||
| logger.debug("Received messages request %s", request.model_dump_json()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be super slow? it calls request.model_dump_json() unconditionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will only work when VLLM_LOGGING_LEVEL=DEBUG
| return JSONResponse(content=generator.model_dump()) | ||
|
|
||
| elif isinstance(generator, AnthropicMessagesResponse): | ||
| logger.debug( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar problem, unconditional call of generator.model_dump(exclude_none=True)
|
|
||
|
|
||
| @router.post( | ||
| "/v1/messages", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems anthropic API only has a new /v1/messages endpoint, why not merge it with the openai server? like serving both v1/chat/completions and /v1/messages endpoints together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they are two different protocols, and It's possible to merge them together for functional compatibility, but I think it could lead to semantic confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think Kaichao is saying to merge them into one endpoint, just hosting them side-by-side. So when you run vllm serve you get /v1/completions, /v1/chat/completions, /v1/messages, etc. I agree this would be optimal for user ease
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think Kaichao is saying to merge them into one endpoint, just hosting them side-by-side. So when you run
vllm serveyou get/v1/completions,/v1/chat/completions,/v1/messages, etc. I agree this would be optimal for user ease
I agree it's the most user-friendly solution.
|
@youkaichao @mgoin @shoted Raised #27882 to add /v1/messages to openai api_server |
nice, bro |
Signed-off-by: liuli <[email protected]> Co-authored-by: liuli <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Michael Goin <[email protected]>
Signed-off-by: liuli <[email protected]> Co-authored-by: liuli <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Michael Goin <[email protected]>
Relate issue: #21313
This PR adds support for the Anthropic /v1/messages REST API endpoint to the vLLM FastAPI server.