Skip to content

Conversation

@chaunceyjiang
Copy link
Collaborator

@chaunceyjiang chaunceyjiang commented Sep 5, 2025

Part of #23222

Refer to https://cookbook.openai.com/articles/openai-harmony#developer-message-format

Refer to https://platform.openai.com/docs/guides/function-calling?lang=python#streaming

Purpose

Stream Function Call - harmony

Test Plan

Test Result

from openai import OpenAI


openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)
tools = [{
    "type": "function",
    "name": "get_weather",
    "description": "Get current temperature for a given location.",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "City and country e.g. Bogotá, Colombia"
            }
        },
        "required": ["location"],
        "additionalProperties": False
    }
}]

# when stream=True
stream = client.responses.create(
    model="",  # gpt-4.1
    input=[{
        "role": "user",
        "content": "What's the weather like in Paris today?"
    }],
    tools=tools,
    stream=True)

for event in stream:
    print(event)
ResponseCreatedEvent(response=Response(id='resp_6d76d1a774c048b9ba79dad944d0c440', created_at=1757070337.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='/home/jovyan/gpt-oss-20b', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[FunctionTool(name='get_weather', parameters={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'City and country e.g. Bogotá, Colombia'}}, 'required': ['location'], 'additionalProperties': False}, strict=None, type='function', description='Get current temperature for a given location.')], top_p=1.0, background=False, max_output_tokens=130933, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=None, truncation='disabled', usage=None, user=None), sequence_number=0, type='response.created')
ResponseInProgressEvent(response=Response(id='resp_6d76d1a774c048b9ba79dad944d0c440', created_at=1757070337.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='/home/jovyan/gpt-oss-20b', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[FunctionTool(name='get_weather', parameters={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'City and country e.g. Bogotá, Colombia'}}, 'required': ['location'], 'additionalProperties': False}, strict=None, type='function', description='Get current temperature for a given location.')], top_p=1.0, background=False, max_output_tokens=130933, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=None, truncation='disabled', usage=None, user=None), sequence_number=1, type='response.in_progress')
ResponseOutputItemAddedEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=None, encrypted_content=None, status='in_progress'), output_index=0, sequence_number=2, type='response.output_item.added')
ResponseContentPartAddedEvent(content_index=0, item_id='', output_index=0, part=ResponseOutputText(annotations=[], text='', type='output_text', logprobs=[]), sequence_number=3, type='response.content_part.added')
ResponseReasoningTextDeltaEvent(content_index=0, delta='User', item_id='', output_index=0, sequence_number=4, type='response.reasoning_text.delta')
ResponseReasoningTextDeltaEvent(content_index=0, delta=' Paris', item_id='', output_index=0, sequence_number=5, type='response.reasoning_text.delta')
...
...
ResponseReasoningTextDeltaEvent(content_index=0, delta='".', item_id='', output_index=0, sequence_number=57, type='response.reasoning_text.delta')
ResponseReasoningTextDoneEvent(content_index=0, item_id='', output_index=1, sequence_number=58, text='User asks for weather in Paris today. We have a function "get_weather" that returns current temperature for given location. So we need to call the function with location Paris. But Paris is ambiguous: City and country e.g. "Paris, France". We\'ll pass "Paris, France".', type='response.reasoning_text.done')
ResponseOutputItemDoneEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=[Content(text='User asks for weather in Paris today. We have a function "get_weather" that returns current temperature for given location. So we need to call the function with location Paris. But Paris is ambiguous: City and country e.g. "Paris, France". We\'ll pass "Paris, France".', type='reasoning_text')], encrypted_content=None, status='completed'), output_index=1, sequence_number=59, type='response.output_item.done')
ResponseOutputItemAddedEvent(item=ResponseFunctionToolCall(arguments='', call_id='fc_071e6ee4ea524c508f17b5854f0ba515', name='get_weather', type='function_call', id='', status='in_progress'), output_index=1, sequence_number=60, type='response.output_item.added')
ResponseFunctionCallArgumentsDeltaEvent(delta='{"', item_id='', output_index=1, sequence_number=61, type='response.function_call_arguments.delta')
ResponseFunctionCallArgumentsDeltaEvent(delta='location', item_id='', output_index=1, sequence_number=62, type='response.function_call_arguments.delta')
ResponseFunctionCallArgumentsDeltaEvent(delta='":"', item_id='', output_index=1, sequence_number=63, type='response.function_call_arguments.delta')
ResponseFunctionCallArgumentsDeltaEvent(delta='Paris', item_id='', output_index=1, sequence_number=64, type='response.function_call_arguments.delta')
ResponseFunctionCallArgumentsDeltaEvent(delta=',', item_id='', output_index=1, sequence_number=65, type='response.function_call_arguments.delta')
ResponseFunctionCallArgumentsDeltaEvent(delta=' France', item_id='', output_index=1, sequence_number=66, type='response.function_call_arguments.delta')
ResponseFunctionCallArgumentsDeltaEvent(delta='"}', item_id='', output_index=1, sequence_number=67, type='response.function_call_arguments.delta')
ResponseFunctionCallArgumentsDoneEvent(arguments='{"location":"Paris, France"}', item_id='', output_index=2, sequence_number=68, type='response.function_call_arguments.done', name='get_weather')
ResponseOutputItemDoneEvent(item=ResponseFunctionToolCall(arguments='{"location":"Paris, France"}', call_id='fc_069ade35aa334b2984d7d9837f59b876', name='get_weather', type='function_call', id=None, status='completed', item_id='', output_index=2, sequence_number=-1), output_index=2, sequence_number=69, type='response.output_item.done')
ResponseCompletedEvent(response=Response(id='resp_6d76d1a774c048b9ba79dad944d0c440', created_at=1757070337.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='/home/jovyan/gpt-oss-20b', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[FunctionTool(name='get_weather', parameters={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'City and country e.g. Bogotá, Colombia'}}, 'required': ['location'], 'additionalProperties': False}, strict=None, type='function', description='Get current temperature for a given location.')], top_p=1.0, background=False, max_output_tokens=130933, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='completed', text=None, top_logprobs=None, truncation='disabled', usage=ResponseUsage(input_tokens=139, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=84, output_tokens_details=OutputTokensDetails(reasoning_tokens=59), total_tokens=223), user=None), sequence_number=70, type='response.completed')

Teat:

pytest -v -s tests/entrypoints/openai/test_response_api_with_harmony.py::test_function_calling_with_stream
...
...
...
The current temperature in Paris is **13.8 °C** (≈55 °F). The day looks mild—usually a mix of light cloud cover with occasional sun. If you’re heading out, a light jacket or sweater and an umbrella (in case of sporadic showers) would be a good idea. Enjoy your time in the city!

PASSED

@mergify mergify bot added the frontend label Sep 5, 2025
@QwertyJack
Copy link
Contributor

QwertyJack commented Sep 5, 2025

Be careful with your commit!

@chaunceyjiang
Copy link
Collaborator Author

@QwertyJack Thanks~

@chaunceyjiang chaunceyjiang force-pushed the gpt_oss_function_call branch 3 times, most recently from 8ee6c2f to 57c7e58 Compare September 12, 2025 05:58
@chaunceyjiang chaunceyjiang marked this pull request as ready for review September 12, 2025 07:46
@mergify mergify bot added the gpt-oss Related to GPT-OSS models label Sep 12, 2025
@zhewenl zhewenl added the github_actions Pull requests that update GitHub Actions code label Sep 12, 2025
@chaunceyjiang
Copy link
Collaborator Author

/cc @yeqcharlotte PTAL.

@mergify
Copy link

mergify bot commented Sep 17, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @chaunceyjiang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 17, 2025
@chaunceyjiang
Copy link
Collaborator Author

/cc @yeqcharlotte @qandrew PTAL.

Copy link
Collaborator

@yeqcharlotte yeqcharlotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the change! function_call_parsing will be a heated path, let's pay more attention to perf :D

@github-project-automation github-project-automation bot moved this from To Triage to In progress in gpt-oss Issues & Enhancements Sep 18, 2025
@chaunceyjiang
Copy link
Collaborator Author

thanks for the change! function_call_parsing will be a heated path, let's pay more attention to perf :D

Hi @yeqcharlotte I've adopted a different approach for the implementation, and there are no performance issues now. Please take another look.

Copy link
Collaborator

@yeqcharlotte yeqcharlotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the change! it looks much better now. wonder did we get a chance to try this with any function call evals?

@qandrew - take a second look at the streaming component?

@chaunceyjiang
Copy link
Collaborator Author

wonder did we get a chance to try this with any function call evals?

I've added a test case, test_function_calling_with_stream, which fully covers the entire workflow.

Copy link
Contributor

@qandrew qandrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments, otherwise LGTM. cc @yeqcharlotte

Copy link
Contributor

@qandrew qandrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! cc @yeqcharlotte

@github-project-automation github-project-automation bot moved this from In progress to Ready in gpt-oss Issues & Enhancements Oct 14, 2025
@yeqcharlotte yeqcharlotte merged commit df850c4 into vllm-project:main Oct 14, 2025
48 checks passed
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
@chaunceyjiang chaunceyjiang deleted the gpt_oss_function_call branch October 21, 2025 08:23
@goldanghenry-debug
Copy link

goldanghenry-debug commented Oct 22, 2025

Hi, thank you for adding the function_call feature! 🙏
I’m currently testing tool calling with the latest nightly build, and I encountered an issue when performing multiple tool calls.

When making multiple tool calls, only the first tool call appears in the OutputItemDoneEvent.
However, the reasoning text clearly indicates that the model intends to call the tool twice (e.g. "need to call get_weather twice").


"response": {
"id": "resp_7ac2cabcdeb146b78c4a5be2531d95e3",
"created_at": 1761106454,
"model": "openai/gpt-oss-20b",
"object": "response",
"output": [
{
"id": "rs_88e261acf87c4ee5b1e93e55a762f602",
"type": "reasoning",
"content": [
{
"text": "User wants weather for two areas: Yeongdeungpo-gu and Guro-gu. Probably need to call get_weather twice. Use tool get_weather.",
"type": "reasoning_text"
}
]
},
{
"arguments": "{"city":"Yeongdeungpo-gu"}",
"call_id": "call_7686f0ca73604a7da85b80310215fb09",
"name": "get_weather",
"type": "function_call",
"id": "fc_7686f0ca73604a7da85b80310215fb09"
}
],
"parallel_tool_calls": true,
"temperature": 1.0,
"tool_choice": "auto"
}
...

@chaunceyjiang
Copy link
Collaborator Author

Hi, @goldanghenry-debug .

When making multiple tool calls, only the first tool call appears in the OutputItemDoneEvent.

Yes.

ref #24637.

It seems that making the same request again will result in a second tool.

@QwertyJack
Copy link
Contributor

OpenAI's documentation states that the |call| token (ID 200012) signals when the model wants to call a tool and serves as a stop token, indicating that gpt-oss doesn't natively support parallel function calling.

However, this limitation can be bypassed by removing that token 200012 from the tokenizer configuration.

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend github_actions Pull requests that update GitHub Actions code gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants