GPT-5 Voice Agent

A real-time voice conversation agent powered by OpenAI's GPT-5, featuring speech-to-text, text-to-speech, and video stream analysis capabilities.

Original Source: This project is based on the GPT-5 voice agent single-file example by @kwindla.

Original Post: X/Twitter announcement by @kwindla.

🎉 GPT-5 Voice Agent Release

This project is based on the original single-file voice agent that was released when GPT-5 became publicly available. The original announcement highlighted the simplicity and power of GPT-5 for voice AI applications.

🚀 Quick Start (Original Method)

# Set your OpenAI API key
export OPENAI_API_KEY=sk_proj-your-api-key-here

# Run the voice agent
uv run gpt-5-voice-agent.py

Note: First-time setup takes about 30 seconds to install dependencies and begin processing audio/video.

⚙️ Recommended GPT-5 Parameters for Voice AI

For optimal voice AI performance, use these parameter settings:

service_tier: "priority"      # Doubles cost but reduces latency
reasoning_effort: "minimal"   # Faster responses for conversation
verbosity: "low"              # Concise responses for voice

The "priority" service tier is recommended for latency-sensitive conversational applications, though it doubles the cost per token.

🔧 Technical Architecture

The original implementation uses a three-model approach:

GPT-5: Main conversation model
OpenAI Whisper: Speech-to-text transcription
OpenAI TTS: Text-to-speech generation

📚 Additional Resources

Pipecat.ai Guide: Comprehensive starter kit with both three-model and Realtime API approaches
Voice AI & Voice Agents Primer: Technical deep dive into building production voice agents
Original Code Gist: The original single-file implementation

🎯 OpenAI Realtime API

OpenAI also released a new natively voice-to-voice Realtime model and API. For more information about using the Realtime API alongside the three-model approach, see the Pipecat.ai documentation above.

🌟 Features

Real-time Voice Conversations: Natural voice interaction with GPT-5
Speech-to-Text: Automatic transcription using OpenAI's Whisper
Text-to-Speech: Natural voice responses using OpenAI's TTS
Video Stream Analysis: Visual understanding of camera feed
Background Process Management: Easy start/stop/status control
PID-based Process Control: Reliable process management

🏗️ Architecture

graph TB
    subgraph "User Interface"
        A[Browser Client] --> B[Pipecat Playground]
        B --> C[WebRTC Connection]
    end
    
    subgraph "Voice Processing"
        C --> D[Microphone Input]
        C --> E[Camera Input]
        D --> F[Speech-to-Text<br/>OpenAI Whisper]
        E --> G[Video Analysis<br/>GPT-5 Vision]
    end
    
    subgraph "AI Processing"
        F --> H[Text Input]
        G --> I[Visual Context]
        H --> J[GPT-5 Processing]
        I --> J
        J --> K[Text Response]
    end
    
    subgraph "Output Generation"
        K --> L[Text-to-Speech<br/>OpenAI TTS]
        L --> M[Audio Output]
        M --> C
    end
    
    subgraph "Management"
        N[start.sh] --> O[Background Process]
        P[stop.sh] --> O
        Q[status.sh] --> O
        O --> R[app.pid]
        O --> S[app.log]
    end
    
    style A fill:#e1f5fe
    style B fill:#f3e5f5
    style J fill:#fff3e0
    style L fill:#e8f5e8
    style N fill:#ffebee
    style P fill:#ffebee
    style Q fill:#ffebee

🚀 Quick Start

Prerequisites

Python 3.12+
uv package manager
OpenAI API key

Installation

Option 1: Automated Installation (Recommended)

# Clone the repository
git clone https://github.com/abdshomad/gpt-5-voice-agent
cd gpt-5-voice-agent

# Run the automated installer
./install.sh

Option 2: Manual Installation

# Clone the repository
git clone https://github.com/abdshomad/gpt-5-voice-agent
cd gpt-5-voice-agent

# Install dependencies
uv sync

# Set up environment
cp .env.example .env
nano .env  # Add your OpenAI API key

Configuration

Update the API key in .env:

OPENAI_API_KEY=sk_proj-your-actual-api-key-here

Running the Application

# Start the application
./start.sh

# Access the app
# Open your browser and go to: http://localhost:7860/client

# Check status
./status.sh

# Stop the application
./stop.sh

📸 Screenshot

The Pipecat Playground interface showing real-time voice conversation with GPT-5, including audio visualization, video stream, and conversation logs.

📋 Management Scripts

Install the App

./install.sh

Automated installation with dependency checking
Verifies uv and Python installation
Installs all dependencies using uv sync
Sets up environment file from template
Makes scripts executable
Provides next steps guidance

Start the App

./start.sh

Starts the voice agent in the background using uv run
Saves PID to app.pid
Logs output to app.log

Stop the App

./stop.sh

Gracefully stops the running app
Removes PID file
Frees port 7860

Check Status

./status.sh

Shows if app is running
Displays recent logs
Shows port status

🔧 Configuration

Environment Variables (.env)

Variable	Description	Required	Default
`OPENAI_API_KEY`	Your OpenAI API key	✅ Yes	-
`DEBUG`	Enable debug logging	❌ No	`false`
`PORT`	Custom port	❌ No	`7860`

Environment Setup

The project includes a .env.example file as a template:

# Copy the example file
cp .env.example .env

# Edit with your actual values
nano .env

Required Setup:

Get your OpenAI API key from OpenAI Platform
Replace sk_proj-your-openai-api-key-here with your actual API key
Save the file

Dependencies

The app uses pyproject.toml for dependency management with the following packages:

numba==0.61.2
openai==1.99.1
python-dotenv
fastapi[all]
uvicorn
pipecat-ai[silero,webrtc,openai]
pipecat-ai-small-webrtc-prebuilt

Development Dependencies (optional):

pytest - Testing framework
black - Code formatting
flake8 - Linting
mypy - Type checking

🎯 Usage

Running the Application

Start the app: ./start.sh
Open browser: Navigate to http://localhost:7860/client
Allow camera/microphone: Grant permissions when prompted
Start talking: Begin your voice conversation with GPT-5
Ask about video: Say "what can you see?" to analyze the camera feed

Verification Commands

# Check if dependencies are installed
uv run python -c "import openai, pipecat_ai, fastapi; print('✅ Dependencies ready!')"

# Test the application
uv run python gpt-5-voice-agent.py --help

# Check running status
./status.sh

# Run the automated installer (if not already run)
./install.sh

🎥 Video Stream Features

The agent can analyze your camera feed in real-time:

Visual Questions: Ask "what do you see?" or "describe the video"
Object Recognition: Identify objects in the camera view
Scene Analysis: Understand the context of your environment

🔍 Troubleshooting

Port Already in Use

# Check what's using port 7860
lsof -i :7860

# Kill the process if needed
kill <PID>

App Won't Start

Check logs: tail -f app.log
Verify API key in .env
Ensure all dependencies are installed: uv sync

Audio Issues

Check microphone permissions in browser
Ensure microphone is not muted
Try refreshing the browser page

Video Issues

Check camera permissions in browser
Ensure camera is not in use by other applications
Try refreshing the browser page

Environment Issues

Ensure .env file exists and has correct API key
Check that .env.example was copied correctly
Verify API key format starts with sk_proj-

📁 Project Structure

gpt-5-voice-agent-2025/
├── gpt-5-voice-agent.py    # Main application
├── pyproject.toml          # Project configuration and dependencies
├── install.sh              # Automated installation script
├── start.sh                # Start script (uses uv run)
├── stop.sh                 # Stop script
├── status.sh               # Status script
├── .env                    # Environment variables (create from .env.example)
├── .env.example           # Environment template
├── .gitignore             # Git ignore rules
├── INSTALL.md             # Detailed installation guide
└── README.md              # This file

📦 Package Management

The project uses pyproject.toml for modern Python packaging:

Dependencies: All required packages are specified in pyproject.toml
Development Tools: Includes configuration for testing, linting, and formatting
Build System: Uses hatchling for building and packaging
Installation: Can be installed with pip install -e . or uv sync

✅ Verified Installation

The project has been tested with the following dependencies:

94 packages installed successfully
Core dependencies: openai, pipecat-ai, fastapi, uvicorn
Audio processing: numba, av, aiortc, pyloudnorm
Video processing: opencv-python, pillow
Development tools: pytest, black, flake8, mypy (optional)

🔧 Automated Installation

The install.sh script provides automated setup:

Dependency checking: Verifies uv and Python installation
Automatic installation: Uses uv sync for reliable dependency management
Environment setup: Creates .env from template
Verification: Tests core dependencies and application
User guidance: Provides clear next steps

🛠️ Development

Running in Development Mode

# Run directly with uv (recommended)
uv run gpt-5-voice-agent.py

# Or use the start script (also uses uv run)
./start.sh

# For development with auto-reload
uv run uvicorn gpt-5-voice-agent:app --reload --host 0.0.0.0 --port 7860

Viewing Logs

# Real-time logs
tail -f app.log

# Recent logs
tail -20 app.log

Stopping Development

# If running directly
Ctrl+C

# If running with start script
./stop.sh

🔐 Security Notes

API Key: Never commit your .env file to version control
Environment Template: Use .env.example as a safe template
Permissions: The app requires camera and microphone access
Network: Runs locally on localhost:7860

📝 License

This project is for educational and personal use. Please ensure you comply with OpenAI's usage policies.

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

📞 Support

If you encounter issues:

Check the logs: tail -f app.log
Verify your OpenAI API key in .env
Ensure all dependencies are installed
Check browser permissions for camera/microphone
Verify .env was created from .env.example

Happy Voice Chatting! 🎤✨

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
INSTALL.md		INSTALL.md
README.md		README.md
gpt-5-voice-agent.py		gpt-5-voice-agent.py
install.sh		install.sh
pyproject.toml		pyproject.toml
screenshot.jpeg		screenshot.jpeg
start.sh		start.sh
status.sh		status.sh
stop.sh		stop.sh
uv.lock		uv.lock

abdshomad/gpt-5-voice-agent

Folders and files

Latest commit

History

Repository files navigation

GPT-5 Voice Agent

🎉 GPT-5 Voice Agent Release

🚀 Quick Start (Original Method)

⚙️ Recommended GPT-5 Parameters for Voice AI

🔧 Technical Architecture

📚 Additional Resources

🎯 OpenAI Realtime API

🌟 Features

🏗️ Architecture

🚀 Quick Start

Prerequisites

Installation

Configuration

Running the Application

📸 Screenshot

📋 Management Scripts

Install the App

Start the App

Stop the App

Check Status

🔧 Configuration

Environment Variables (.env)

Environment Setup

Dependencies

🎯 Usage

Running the Application

Verification Commands

🎥 Video Stream Features

🔍 Troubleshooting

Port Already in Use

App Won't Start

Audio Issues

Video Issues

Environment Issues

📁 Project Structure

📦 Package Management

✅ Verified Installation

🔧 Automated Installation

🛠️ Development

Running in Development Mode

Viewing Logs

Stopping Development

🔐 Security Notes

📝 License

🤝 Contributing

📞 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages