[Feature][Responses API] Stream Function Call - harmony #24317

chaunceyjiang · 2025-09-05T11:03:59Z

Refer to https://cookbook.openai.com/articles/openai-harmony#developer-message-format

Refer to https://platform.openai.com/docs/guides/function-calling?lang=python#streaming

Purpose

Stream Function Call - harmony

Test Plan

Test Result

from openai import OpenAI


openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)
tools = [{
    "type": "function",
    "name": "get_weather",
    "description": "Get current temperature for a given location.",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "City and country e.g. Bogotá, Colombia"
            }
        },
        "required": ["location"],
        "additionalProperties": False
    }
}]

# when stream=True
stream = client.responses.create(
    model="",  # gpt-4.1
    input=[{
        "role": "user",
        "content": "What's the weather like in Paris today?"
    }],
    tools=tools,
    stream=True)

for event in stream:
    print(event)

ResponseCreatedEvent(response=Response(id='resp_6d76d1a774c048b9ba79dad944d0c440', created_at=1757070337.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='/home/jovyan/gpt-oss-20b', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[FunctionTool(name='get_weather', parameters={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'City and country e.g. Bogotá, Colombia'}}, 'required': ['location'], 'additionalProperties': False}, strict=None, type='function', description='Get current temperature for a given location.')], top_p=1.0, background=False, max_output_tokens=130933, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=None, truncation='disabled', usage=None, user=None), sequence_number=0, type='response.created')
ResponseInProgressEvent(response=Response(id='resp_6d76d1a774c048b9ba79dad944d0c440', created_at=1757070337.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='/home/jovyan/gpt-oss-20b', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[FunctionTool(name='get_weather', parameters={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'City and country e.g. Bogotá, Colombia'}}, 'required': ['location'], 'additionalProperties': False}, strict=None, type='function', description='Get current temperature for a given location.')], top_p=1.0, background=False, max_output_tokens=130933, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=None, truncation='disabled', usage=None, user=None), sequence_number=1, type='response.in_progress')
ResponseOutputItemAddedEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=None, encrypted_content=None, status='in_progress'), output_index=0, sequence_number=2, type='response.output_item.added')
ResponseContentPartAddedEvent(content_index=0, item_id='', output_index=0, part=ResponseOutputText(annotations=[], text='', type='output_text', logprobs=[]), sequence_number=3, type='response.content_part.added')
ResponseReasoningTextDeltaEvent(content_index=0, delta='User', item_id='', output_index=0, sequence_number=4, type='response.reasoning_text.delta')
ResponseReasoningTextDeltaEvent(content_index=0, delta=' Paris', item_id='', output_index=0, sequence_number=5, type='response.reasoning_text.delta')
...
...
ResponseReasoningTextDeltaEvent(content_index=0, delta='".', item_id='', output_index=0, sequence_number=57, type='response.reasoning_text.delta')
ResponseReasoningTextDoneEvent(content_index=0, item_id='', output_index=1, sequence_number=58, text='User asks for weather in Paris today. We have a function "get_weather" that returns current temperature for given location. So we need to call the function with location Paris. But Paris is ambiguous: City and country e.g. "Paris, France". We\'ll pass "Paris, France".', type='response.reasoning_text.done')
ResponseOutputItemDoneEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=[Content(text='User asks for weather in Paris today. We have a function "get_weather" that returns current temperature for given location. So we need to call the function with location Paris. But Paris is ambiguous: City and country e.g. "Paris, France". We\'ll pass "Paris, France".', type='reasoning_text')], encrypted_content=None, status='completed'), output_index=1, sequence_number=59, type='response.output_item.done')
ResponseOutputItemAddedEvent(item=ResponseFunctionToolCall(arguments='', call_id='fc_071e6ee4ea524c508f17b5854f0ba515', name='get_weather', type='function_call', id='', status='in_progress'), output_index=1, sequence_number=60, type='response.output_item.added')
ResponseFunctionCallArgumentsDeltaEvent(delta='{"', item_id='', output_index=1, sequence_number=61, type='response.function_call_arguments.delta')
ResponseFunctionCallArgumentsDeltaEvent(delta='location', item_id='', output_index=1, sequence_number=62, type='response.function_call_arguments.delta')
ResponseFunctionCallArgumentsDeltaEvent(delta='":"', item_id='', output_index=1, sequence_number=63, type='response.function_call_arguments.delta')
ResponseFunctionCallArgumentsDeltaEvent(delta='Paris', item_id='', output_index=1, sequence_number=64, type='response.function_call_arguments.delta')
ResponseFunctionCallArgumentsDeltaEvent(delta=',', item_id='', output_index=1, sequence_number=65, type='response.function_call_arguments.delta')
ResponseFunctionCallArgumentsDeltaEvent(delta=' France', item_id='', output_index=1, sequence_number=66, type='response.function_call_arguments.delta')
ResponseFunctionCallArgumentsDeltaEvent(delta='"}', item_id='', output_index=1, sequence_number=67, type='response.function_call_arguments.delta')
ResponseFunctionCallArgumentsDoneEvent(arguments='{"location":"Paris, France"}', item_id='', output_index=2, sequence_number=68, type='response.function_call_arguments.done', name='get_weather')
ResponseOutputItemDoneEvent(item=ResponseFunctionToolCall(arguments='{"location":"Paris, France"}', call_id='fc_069ade35aa334b2984d7d9837f59b876', name='get_weather', type='function_call', id=None, status='completed', item_id='', output_index=2, sequence_number=-1), output_index=2, sequence_number=69, type='response.output_item.done')
ResponseCompletedEvent(response=Response(id='resp_6d76d1a774c048b9ba79dad944d0c440', created_at=1757070337.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='/home/jovyan/gpt-oss-20b', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[FunctionTool(name='get_weather', parameters={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'City and country e.g. Bogotá, Colombia'}}, 'required': ['location'], 'additionalProperties': False}, strict=None, type='function', description='Get current temperature for a given location.')], top_p=1.0, background=False, max_output_tokens=130933, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='completed', text=None, top_logprobs=None, truncation='disabled', usage=ResponseUsage(input_tokens=139, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=84, output_tokens_details=OutputTokensDetails(reasoning_tokens=59), total_tokens=223), user=None), sequence_number=70, type='response.completed')

Teat:

pytest -v -s tests/entrypoints/openai/test_response_api_with_harmony.py::test_function_calling_with_stream
...
...
...
The current temperature in Paris is **13.8 °C** (≈55 °F). The day looks mild—usually a mix of light cloud cover with occasional sun. If you’re heading out, a light jacket or sweater and an umbrella (in case of sporadic showers) would be a good idea. Enjoy your time in the city!

PASSED

QwertyJack · 2025-09-05T12:45:05Z

Be careful with your commit!

chaunceyjiang · 2025-09-05T12:46:28Z

@QwertyJack Thanks~

vllm/entrypoints/openai/protocol.py

chaunceyjiang · 2025-09-17T03:03:05Z

/cc @yeqcharlotte PTAL.

mergify · 2025-09-17T23:39:54Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @chaunceyjiang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

vllm/entrypoints/openai/serving_responses.py

chaunceyjiang · 2025-09-18T07:34:19Z

/cc @yeqcharlotte @qandrew PTAL.

yeqcharlotte

thanks for the change! function_call_parsing will be a heated path, let's pay more attention to perf :D

vllm/entrypoints/openai/serving_responses.py

vllm/entrypoints/openai/protocol.py

vllm/entrypoints/openai/serving_responses.py

vllm/entrypoints/openai/protocol.py

vllm/entrypoints/openai/serving_responses.py

chaunceyjiang · 2025-10-11T08:12:02Z

thanks for the change! function_call_parsing will be a heated path, let's pay more attention to perf :D

Hi @yeqcharlotte I've adopted a different approach for the implementation, and there are no performance issues now. Please take another look.

Signed-off-by: chaunceyjiang <[email protected]>

yeqcharlotte

thanks for the change! it looks much better now. wonder did we get a chance to try this with any function call evals?

@qandrew - take a second look at the streaming component?

vllm/entrypoints/openai/serving_responses.py

chaunceyjiang · 2025-10-13T07:55:42Z

wonder did we get a chance to try this with any function call evals?

I've added a test case, test_function_calling_with_stream, which fully covers the entire workflow.

Signed-off-by: chaunceyjiang <[email protected]>

qandrew

some minor comments, otherwise LGTM. cc @yeqcharlotte

tests/entrypoints/openai/test_response_api_with_harmony.py

vllm/entrypoints/openai/serving_responses.py

Signed-off-by: chaunceyjiang <[email protected]>

qandrew

LGTM! cc @yeqcharlotte

…#24317) Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: Dhruvil Bhatt <[email protected]>

…#24317) Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: bbartels <[email protected]>

…#24317) Signed-off-by: chaunceyjiang <[email protected]>

goldanghenry-debug · 2025-10-22T04:17:54Z

Hi, thank you for adding the function_call feature! 🙏
I’m currently testing tool calling with the latest nightly build, and I encountered an issue when performing multiple tool calls.

When making multiple tool calls, only the first tool call appears in the OutputItemDoneEvent.
However, the reasoning text clearly indicates that the model intends to call the tool twice (e.g. "need to call get_weather twice").

"response": {
"id": "resp_7ac2cabcdeb146b78c4a5be2531d95e3",
"created_at": 1761106454,
"model": "openai/gpt-oss-20b",
"object": "response",
"output": [
{
"id": "rs_88e261acf87c4ee5b1e93e55a762f602",
"type": "reasoning",
"content": [
{
"text": "User wants weather for two areas: Yeongdeungpo-gu and Guro-gu. Probably need to call get_weather twice. Use tool get_weather.",
"type": "reasoning_text"
}
]
},
{
"arguments": "{"city":"Yeongdeungpo-gu"}",
"call_id": "call_7686f0ca73604a7da85b80310215fb09",
"name": "get_weather",
"type": "function_call",
"id": "fc_7686f0ca73604a7da85b80310215fb09"
}
],
"parallel_tool_calls": true,
"temperature": 1.0,
"tool_choice": "auto"
}
...

chaunceyjiang · 2025-10-22T06:57:35Z

Hi, @goldanghenry-debug .

When making multiple tool calls, only the first tool call appears in the OutputItemDoneEvent.

Yes.

ref #24637.

It seems that making the same request again will result in a second tool.

QwertyJack · 2025-10-22T07:12:14Z

OpenAI's documentation states that the |call| token (ID 200012) signals when the model wants to call a tool and serves as a stop token, indicating that gpt-oss doesn't natively support parallel function calling.

However, this limitation can be bypassed by removing that token 200012 from the tokenizer configuration.

…#24317) Signed-off-by: chaunceyjiang <[email protected]>

…#24317) Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

…#24317) Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: 0xrushi <[email protected]>

…#24317) Signed-off-by: chaunceyjiang <[email protected]>

mergify bot added the frontend label Sep 5, 2025

chaunceyjiang force-pushed the gpt_oss_function_call branch 3 times, most recently from 8ee6c2f to 57c7e58 Compare September 12, 2025 05:58

chaunceyjiang commented Sep 12, 2025

View reviewed changes

vllm/entrypoints/openai/protocol.py Outdated Show resolved Hide resolved

chaunceyjiang marked this pull request as ready for review September 12, 2025 07:46

chaunceyjiang requested review from DarkLight1337, NickLucche, aarnphm, robertgshaw2-redhat and simon-mo as code owners September 12, 2025 07:46

mergify bot added the gpt-oss Related to GPT-OSS models label Sep 12, 2025

zhewenl added the github_actions Pull requests that update GitHub Actions code label Sep 12, 2025

yeqcharlotte added this to gpt-oss Issues & Enhancements Sep 14, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Sep 14, 2025

chaunceyjiang mentioned this pull request Sep 17, 2025

[Feature][Responses API] Stream Function Call #23222

Open

1 task

chaunceyjiang requested a review from yeqcharlotte September 17, 2025 03:03

mergify bot added the needs-rebase label Sep 17, 2025

qandrew reviewed Sep 18, 2025

View reviewed changes

vllm/entrypoints/openai/serving_responses.py Show resolved Hide resolved

chaunceyjiang force-pushed the gpt_oss_function_call branch from bd86ca9 to cafaea7 Compare September 18, 2025 07:00

mergify bot removed the needs-rebase label Sep 18, 2025

yeqcharlotte requested changes Sep 18, 2025

View reviewed changes

github-project-automation bot moved this from To Triage to In progress in gpt-oss Issues & Enhancements Sep 18, 2025

chaunceyjiang commented Sep 18, 2025

View reviewed changes

vllm/entrypoints/openai/protocol.py Outdated Show resolved Hide resolved

qandrew reviewed Sep 19, 2025

View reviewed changes

vllm/entrypoints/openai/serving_responses.py Show resolved Hide resolved

[Feature][Responses API] Stream Function Call - harmony

f9e57d9

Signed-off-by: chaunceyjiang <[email protected]>

yeqcharlotte reviewed Oct 13, 2025

View reviewed changes

vllm/entrypoints/openai/serving_responses.py Outdated Show resolved Hide resolved

[Feature][Responses API] Stream Function Call - harmony

865df68

Signed-off-by: chaunceyjiang <[email protected]>

qandrew reviewed Oct 13, 2025

View reviewed changes

[Feature][Responses API] Stream Function Call - harmony

5371eaf

Signed-off-by: chaunceyjiang <[email protected]>

chaunceyjiang requested review from qandrew and yeqcharlotte October 14, 2025 06:39

[Feature][Responses API] Stream Function Call - harmony

d177d0e

Signed-off-by: chaunceyjiang <[email protected]>

qandrew approved these changes Oct 14, 2025

View reviewed changes

yeqcharlotte approved these changes Oct 14, 2025

View reviewed changes

github-project-automation bot moved this from In progress to Ready in gpt-oss Issues & Enhancements Oct 14, 2025

yeqcharlotte merged commit df850c4 into vllm-project:main Oct 14, 2025
48 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Oct 14, 2025

Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025

[Feature][Responses API] Stream Function Call - harmony (vllm-project…

9098073

…#24317) Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: Dhruvil Bhatt <[email protected]>

bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025

[Feature][Responses API] Stream Function Call - harmony (vllm-project…

775999a

…#24317) Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: bbartels <[email protected]>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Feature][Responses API] Stream Function Call - harmony (vllm-project…

cbff281

…#24317) Signed-off-by: chaunceyjiang <[email protected]>

chaunceyjiang deleted the gpt_oss_function_call branch October 21, 2025 08:23

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Feature][Responses API] Stream Function Call - harmony (vllm-project…

671998a

…#24317) Signed-off-by: chaunceyjiang <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Feature][Responses API] Stream Function Call - harmony (vllm-project…

cc24c9d

…#24317) Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Feature][Responses API] Stream Function Call - harmony (vllm-project…

ccdf1f5

…#24317) Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Feature][Responses API] Stream Function Call - harmony (vllm-project…

f37cb8e

…#24317) Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: 0xrushi <[email protected]>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Feature][Responses API] Stream Function Call - harmony (vllm-project…

c7bfb0b

…#24317) Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: 0xrushi <[email protected]>

chaunceyjiang mentioned this pull request Nov 5, 2025

[Misc] Remove the duplicate code #28111

Merged

5 tasks

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Feature][Responses API] Stream Function Call - harmony (vllm-project…

f2cff37

…#24317) Signed-off-by: chaunceyjiang <[email protected]>

Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025

[Feature][Responses API] Stream Function Call - harmony (vllm-project…

cc69bea

…#24317) Signed-off-by: chaunceyjiang <[email protected]>

Uh oh!

[Feature][Responses API] Stream Function Call - harmony #24317

[Feature][Responses API] Stream Function Call - harmony #24317

Conversation

chaunceyjiang commented Sep 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

QwertyJack commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chaunceyjiang commented Sep 5, 2025

Uh oh!

Uh oh!

chaunceyjiang commented Sep 17, 2025

Uh oh!

mergify bot commented Sep 17, 2025

Uh oh!

Uh oh!

chaunceyjiang commented Sep 18, 2025

Uh oh!

yeqcharlotte left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chaunceyjiang commented Oct 11, 2025

Uh oh!

yeqcharlotte left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chaunceyjiang commented Oct 13, 2025

Uh oh!

qandrew left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qandrew left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

goldanghenry-debug commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chaunceyjiang commented Oct 22, 2025

Uh oh!

QwertyJack commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

chaunceyjiang commented Sep 5, 2025 •

edited by github-actions bot

Loading

QwertyJack commented Sep 5, 2025 •

edited

Loading

goldanghenry-debug commented Oct 22, 2025 •

edited

Loading