feat: #1614 gpt-realtime migration (Realtime API GA) #1646

seratch · 2025-09-03T06:13:00Z

this is still in progress but will resolve #1614

rm-openai · 2025-09-03T16:33:08Z

examples/realtime/app/server.py

+        # Disable server-side interrupt_response to avoid truncating assistant audio
+        session_context = await runner.run(
+            model_config={
+                "initial_model_settings": {
+                    "turn_detection": {"type": "semantic_vad", "interrupt_response": False}
+                }
+            }
+        )


do we need to do this by default? why?

I explored some changes to make the audio output quality, but they're not related to the gpt-realtime migration. So, I've reverted all of them. I will continue seeing improvements for this example app, but it can be done with a separate pull request.

I was testing to change to new voices, this is taken from the examples (examples/realtime/app)

model_settings: RealtimeSessionModelSettings = { "model_name": "gpt-realtime", "modalities": ["text", "audio"], "voice": "marin", "speed": 1.0, "input_audio_format": "pcm16", "output_audio_format": "pcm16", "input_audio_transcription": { "model": "gpt-4o-mini-transcribe", }, "turn_detection": {"type": "semantic_vad", "threshold": 0.5}, # "instructions": "…", # optional # "prompt": "…", # optional # "tool_choice": "auto", # optional # "tools": [], # optional # "handoffs": [], # optional # "tracing": {"enabled": False}, # optional } config = RealtimeRunConfig(model_settings=model_settings) runner = RealtimeRunner(starting_agent=get_starting_agent()) I noticied that voice is changed but I lost all agents handoff, tool, etc. I setted config via RealtimeRunConfig and RealtimeModelConfig. In both cases happened the same.

rm-openai · 2025-09-03T16:33:34Z

examples/realtime/app/server.py

            base_event["output"] = str(event.output)
        elif event.type == "audio":
-            base_event["audio"] = base64.b64encode(event.audio.data).decode("utf-8")
+            # Coalesce raw PCM and flush on a steady timer for smoother playback.


is this just a quality improvement? would be nice to make it be a separate PR if so

yeah, same with above (I won't repeat this for the rest)

examples/realtime/app/server.py

src/agents/realtime/config.py

KelSolaar · 2025-09-07T22:21:08Z

Hello,

Any ETA on this one? I could be using it right now. :)

Cheers,

Thomas

na-proyectran · 2025-09-08T14:04:08Z

Hi @seratch, do you know if this PR is going to be merged this week? No pressure, just to know ETA in this cases. Thanks you very much!

By the way, class OpenAIRealtimeWebSocketModel(RealtimeModel) has "gpt-4o-realtime-preview" by default (and you can't change it). Should by nice to set to "gpt-realtime".

adinin · 2025-09-08T16:01:43Z

Hi @seratch, do you know if this PR is going to be merged this week? No pressure, just to know ETA in this cases. Thanks you very much!

not to speak for @seratch, but this is probably mostly dependent more on the review from @rm-openai

examples/realtime/app/server.py

src/agents/realtime/agent.py

KelSolaar · 2025-09-09T04:14:29Z

@seratch :

FYI, noted that with OpenAI 1.107.0, I get this import error using your branch:

  File "\.venv\Lib\site-packages\agents\realtime\__init__.py", line 84, in <module>
    from .openai_realtime import (
    ...<3 lines>...
    )
  File "\.venv\Lib\site-packages\agents\realtime\openai_realtime.py", line 32, in <module>
    from openai.types.realtime.realtime_audio_config import (
    ...<3 lines>...
    )
ImportError: cannot import name 'Input' from 'openai.types.realtime.realtime_audio_config' (\.venv\Lib\site-packages\openai\types\realtime\realtime_audio_config.py)

seratch · 2025-09-09T05:04:04Z

@KelSolaar Thanks for letting me know this! Will resolve the conflicts.

KelSolaar · 2025-09-09T05:18:40Z

You are very much welcome! The new model has also mostly solved the issue I reported here: #1681

na-proyectran · 2025-09-09T08:43:42Z

@rm-openai @seratch What about changing OpenAIRealtimeWebSocketModel(RealtimeModel) model from "gpt-4o-realtime-preview" to "gpt-realtime"? Should be nice to have it as default, or better, to make possible to select realtime model to use.

seratch · 2025-09-09T08:55:40Z

@na-proyectran This pull request already does the change. Once this is released, the default model will be changed.

Right now, we're waiting for the underlying openai package updates. So, it may take a bit more time. Thank you all for waiting for a while.

na-proyectran · 2025-09-09T10:38:36Z

@seratch :

FYI, noted that with OpenAI 1.107.0 (released 16h ago), I get this import error using your branch:

  File "\.venv\Lib\site-packages\agents\realtime\__init__.py", line 84, in <module>
    from .openai_realtime import (
    ...<3 lines>...
    )
  File "\.venv\Lib\site-packages\agents\realtime\openai_realtime.py", line 32, in <module>
    from openai.types.realtime.realtime_audio_config import (
    ...<3 lines>...
    )
ImportError: cannot import name 'Input' from 'openai.types.realtime.realtime_audio_config' (\.venv\Lib\site-packages\openai\types\realtime\realtime_audio_config.py)

Not the only, in openai-python (release 1.107.0) they removed other things like:

from openai.types.realtime.realtime_tools_config_union import (
Function as OpenAISessionFunction,
)
-> Function (now only MCP)
from openai.types.realtime.realtime_session import (
RealtimeSession as OpenAISessionObject,
)
-> realtime_session (no longer here)

from openai.types.realtime.realtime_audio_config import (
Input as OpenAIRealtimeAudioInput,
Output as OpenAIRealtimeAudioOutput,
RealtimeAudioConfig as OpenAIRealtimeAudioConfig,
)
-> OpenAIRealtimeAudioOutput (no longer)
-> OpenAIRealtimeAudioInput (no longer)

WesselBosscher · 2025-09-09T14:52:03Z

@na-proyectran This pull request already does the change. Once this is released, the default model will be changed.

Right now, we're waiting for the underlying openai package updates. So, it may take a bit more time. Thank you all for waiting for a while.

sounds great! do you have an idea when that will be? should I think of days, weeks, months?

thanks!

KelSolaar · 2025-09-09T19:59:02Z

@na-proyectran This pull request already does the change. Once this is released, the default model will be changed.
Right now, we're waiting for the underlying openai package updates. So, it may take a bit more time. Thank you all for waiting for a while.

sounds great! do you have an idea when that will be? should I think of days, weeks, months?

thanks!

The pull request is essentially functional as is and can be tested, just make sure that you pin your requirements:

    "openai==1.106.1",
    "openai-agents @ git+https://github.com/openai/openai-agents-python@realtime-ga",

KelSolaar · 2025-09-09T22:10:52Z

Hello,

I'm looking for image input, and unless I'm missing something, it is not supported at the moment right?

From agents\realtime\openai_realtime.py:

    @classmethod
    def convert_user_input_to_conversation_item(
        cls, event: RealtimeModelSendUserInput
    ) -> OpenAIConversationItem:
        user_input = event.user_input

        if isinstance(user_input, dict):
            return RealtimeConversationItemUserMessage(
                type="message",
                role="user",
                content=[
                    Content(
                        type="input_text",
                        text=item.get("text"),
                    )
                    for item in user_input.get("content", [])
                ],
            )
        else:
            return RealtimeConversationItemUserMessage(
                type="message",
                role="user",
                content=[Content(type="input_text", text=user_input)],
            )

The API should look like this:

{
    "type": "conversation.item.create",
    "previous_item_id": null,
    "item": {
        "type": "message",
        "role": "user",
        "content": [
            {
                "type": "input_image",
                "image_url": "data:image/{format(example: png)};base64,{some_base64_image_bytes}"
            }
        ]
    }
}

seratch · 2025-09-09T22:28:03Z

@KelSolaar Thanks for pointing the lack out. The image input should be supported but it's missing here now. I will update the code to cover the use case too.

KelSolaar · 2025-09-09T22:45:09Z

@KelSolaar Thanks for pointing the lack out. The image input should be supported but it's missing here now. I will update the code to cover the use case too.

Thanks a ton and sorry for making this PR harder to push through!

na-proyectran · 2025-09-10T09:50:42Z

@na-proyectran This pull request already does the change. Once this is released, the default model will be changed.
Right now, we're waiting for the underlying openai package updates. So, it may take a bit more time. Thank you all for waiting for a while.

sounds great! do you have an idea when that will be? should I think of days, weeks, months?
thanks!

The pull request is essentially functional as is and can be tested, just make sure that you pin your requirements:
    "openai==1.106.1",
    "openai-agents @ git+https://github.com/openai/openai-agents-python@realtime-ga",

It's, just pointing new openai release.

feat(api): ship the RealtimeGA API shape
Updates types to use the GA shape for Realtime API
release: 1.107.0

I mean, should by nice to sync with last openai release

seratch · 2025-09-11T05:54:37Z

.github/workflows/tests.yml

          enable-cache: true
      - name: Install dependencies
        run: make sync
-      - name: Install Python 3.9 dependencies


moved to makefile

seratch · 2025-09-11T05:54:51Z

.gitignore

 # Environments
-.env
+.python-version
+.env*


for local python 3.9 tests

seratch · 2025-09-11T05:56:21Z

examples/realtime/app/static/app.js

        this.playbackAudioContext = null;
        this.currentAudioSource = null;
-
+        this.currentAudioGain = null; // per-chunk gain for smooth fades


adjusted internals of this JS code to more smoothly play the audio chunks (less gain noise)

seratch · 2025-09-11T05:56:45Z

examples/realtime/app/static/app.js

            this.toggleMute();
        });
+
+        // Image upload


for image file inputs

seratch · 2025-09-11T05:57:59Z

src/agents/realtime/_util.py


 def calculate_audio_length_ms(format: RealtimeAudioFormat | None, audio_bytes: bytes) -> float:
-    if format and format.startswith("g711"):
+    if format and isinstance(format, str) and format.startswith("g711"):


how the format data could be either str or dict/class

seratch · 2025-09-11T06:41:48Z

src/agents/realtime/audio_formats.py

+from ..logger import logger
+
+
+def to_realtime_audio_format(


TS SDK does the same

seratch · 2025-09-11T06:42:22Z

src/agents/realtime/openai_realtime.py

    RealtimeModelSendUserInput,
 )

+# Avoid direct imports of non-exported names by referencing via module


just for mypy warnings

seratch · 2025-09-11T06:43:13Z

src/agents/realtime/openai_realtime.py


 DEFAULT_MODEL_SETTINGS: RealtimeSessionModelSettings = {
    "voice": "ash",
-    "modalities": ["text", "audio"],


The initial release of gpt-realtime does not support having both, so changed this default settings; you can still receive transcript in addition to audio chunks

can you change default voice to newer ones, for quality improvement

seratch · 2025-09-11T06:44:35Z

src/agents/realtime/openai_realtime.py


    async def _handle_ws_event(self, event: dict[str, Any]):
        await self._emit_event(RealtimeModelRawServerEvent(data=event))
+        # The public interface definedo on this Agents SDK side (e.g., RealtimeMessageItem)


as mentioned here, this SDK's public interface was the same with beta API's data structure and the GA ones are slightly different. Thus, converting the data to fill the gap here

seratch · 2025-09-11T06:45:19Z

src/agents/realtime/openai_realtime.py

            await self._emit_event(RealtimeModelItemDeletedEvent(item_id=parsed.item_id))
        elif (
-            parsed.type == "conversation.item.created"
+            parsed.type == "conversation.item.added"


this is necessary to detect the user input item addition

chatgpt-codex-connector

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

src/agents/realtime/openai_realtime.py

This pull request updates the realtime agent documents for the GA model. We can merge this change after releasing a new version including #1646

KelSolaar · 2025-09-11T20:55:19Z

Thank you @seratch!

kucharzyk-sebastian · 2025-09-12T06:51:12Z

Amazing work, thanks a lot to everyone involved! I've just noticed one small thing - the Twilio example in /examples/realtime/twilio produces only noise right now. Is there a chance to update it as well?

seratch · 2025-09-18T09:24:20Z

Thanks for the feedback. I'll update the twilio example sometime soon!

KelSolaar · 2025-09-24T12:17:02Z

examples/realtime/app/static/app.js

+        // Smoothly ramp down before stopping to avoid clicks
+        if (this.currentAudioSource && this.playbackAudioContext) {
            try {
-                this.currentAudioSource.stop();
-                this.currentAudioSource = null;
+                const now = this.playbackAudioContext.currentTime;
+                const fade = Math.max(0.01, this.playbackFadeSec);
+                if (this.currentAudioGain) {
+                    try {
+                        this.currentAudioGain.gain.cancelScheduledValues(now);
+                        // Capture current value to ramp from it
+                        const current = this.currentAudioGain.gain.value ?? 1.0;
+                        this.currentAudioGain.gain.setValueAtTime(current, now);
+                        this.currentAudioGain.gain.linearRampToValueAtTime(0.0001, now + fade);
+                    } catch {}
+                }
+                // Stop after the fade completes
+                setTimeout(() => {
+                    try { this.currentAudioSource && this.currentAudioSource.stop(); } catch {}
+                    this.currentAudioSource = null;
+                    this.currentAudioGain = null;
+                }, Math.ceil(fade * 1000));


@seratch : Why are we getting those clicks in the first place, is this a scheduling issue?

seratch requested a review from rm-openai September 3, 2025 06:13

seratch added enhancement New feature or request feature:realtime labels Sep 3, 2025

rm-openai reviewed Sep 3, 2025

View reviewed changes

seratch force-pushed the realtime-ga branch 2 times, most recently from a4333dd to f02b096 Compare September 4, 2025 10:20

seratch marked this pull request as ready for review September 4, 2025 10:22

adinin reviewed Sep 5, 2025

View reviewed changes

src/agents/realtime/config.py Show resolved Hide resolved

seratch force-pushed the realtime-ga branch from f02b096 to f3bff56 Compare September 8, 2025 08:25

seratch mentioned this pull request Sep 8, 2025

Enable gpt-realtime released at 2025 08 28 #1615

Closed

4 tasks

rm-openai approved these changes Sep 8, 2025

View reviewed changes

examples/realtime/app/server.py Outdated Show resolved Hide resolved

src/agents/realtime/agent.py Show resolved Hide resolved

seratch force-pushed the realtime-ga branch from 9b2af2c to 10c9e6c Compare September 8, 2025 22:24

seratch added a commit that referenced this pull request Sep 8, 2025

Update Realtime Agent documents (ref #1646)

7a3e4f6

seratch mentioned this pull request Sep 8, 2025

Update Realtime Agent documents (ref #1646) #1695

Merged

seratch marked this pull request as draft September 9, 2025 06:05

seratch added 7 commits September 11, 2025 10:37

Add prompt support

c95ee87

Add gpt-realtime-2025-08-28

8eded3f

Upgrade openai package

3eaae81

review feedback

3465e71

wip: changes with the latest openai package

af35e5a

fix

3101b74

Upgrade openai package and fix warnings

7afde98

seratch force-pushed the realtime-ga branch from 30bbd8d to 7afde98 Compare September 11, 2025 01:38

seratch added 5 commits September 11, 2025 10:57

Add more unit tests

2724e29

fix mypy errors

129069b

Add image input support

eebfbca

fix tests

eacb3f0

improve the example audio player

45e4e97

seratch mentioned this pull request Sep 11, 2025

"gpt-realtime" model is not usable for the RealtimeRunner #1708

Closed

seratch added 2 commits September 11, 2025 14:53

fix

361d88d

tweak

8f2a4fb

seratch commented Sep 11, 2025

View reviewed changes

seratch marked this pull request as ready for review September 11, 2025 06:47

chatgpt-codex-connector bot reviewed Sep 11, 2025

View reviewed changes

src/agents/realtime/openai_realtime.py Outdated Show resolved Hide resolved

seratch added 2 commits September 11, 2025 16:34

review feedback

e410f66

Fix for python 3.9

91273c4

rm-openai pushed a commit that referenced this pull request Sep 11, 2025

Update Realtime Agent documents (ref #1646) (#1695)

f3339f0

This pull request updates the realtime agent documents for the GA model. We can merge this change after releasing a new version including #1646

Merge branch 'main' into realtime-ga

c13fb3e

rm-openai merged commit 9168348 into main Sep 11, 2025
5 checks passed

rm-openai deleted the realtime-ga branch September 11, 2025 17:56

KelSolaar reviewed Sep 24, 2025

View reviewed changes

Ali-Olliek mentioned this pull request Oct 13, 2025

gpt-realtime-mini Support #1887

Open

feat: #1614 gpt-realtime migration (Realtime API GA) #1646

feat: #1614 gpt-realtime migration (Realtime API GA) #1646

Uh oh!

Conversation

seratch commented Sep 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KelSolaar commented Sep 7, 2025

Uh oh!

na-proyectran commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adinin commented Sep 8, 2025

Uh oh!

Uh oh!

Uh oh!

KelSolaar commented Sep 9, 2025

Uh oh!

seratch commented Sep 9, 2025

Uh oh!

KelSolaar commented Sep 9, 2025

Uh oh!

na-proyectran commented Sep 9, 2025

Uh oh!

seratch commented Sep 9, 2025

Uh oh!

na-proyectran commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WesselBosscher commented Sep 9, 2025

Uh oh!

KelSolaar commented Sep 9, 2025

Uh oh!

KelSolaar commented Sep 9, 2025

Uh oh!

seratch commented Sep 9, 2025

Uh oh!

KelSolaar commented Sep 9, 2025

Uh oh!

na-proyectran commented Sep 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

na-proyectran commented Sep 8, 2025 •

edited

Loading

na-proyectran commented Sep 9, 2025 •

edited

Loading