Breeze Automatic is a sophisticated, dual-component server designed to power advanced conversational AI experiences. It comprises a Gemini Live Proxy Server for direct, low-level integration with Google's Gemini API and a Pipecat-based Voice Agent for building robust, real-time voice assistants.
A FastAPI-based server that acts as a direct proxy to the Gemini Live API. It handles WebSocket connections, pre-fetches and caches analytics data from various sources (Juspay, Breeze), and enriches the Gemini model's context for more informed, real-time conversations.
A standalone voice agent built on the Pipecat framework. It's launched as a subprocess by the main FastAPI server and handles the end-to-end voice conversation flow, including:
- Speech-to-Text (STT)
- Language Model (LLM) interaction with dynamic tool use
- Text-to-Speech (TTS)
- Dual-Mode Operation: Can run in
live
mode with real-time data fetching ortest
mode using dummy data. - Dynamic Tool Loading: The voice agent dynamically loads tools based on the operating mode and provided credentials (e.g., Juspay and Breeze tools are only loaded in
live
mode with valid tokens). - Multi-Provider Analytics: Integrates with both Juspay and Breeze APIs to fetch a wide range of analytics data, including sales, orders, marketing, and checkout metrics.
- Personalized Prompts: The agent's system prompt can be personalized with the user's name for a more engaging experience.
- Real-time Audio Streaming: Bidirectional audio streaming via WebSockets.
- Environment-Driven Configuration: All sensitive keys and settings are managed via environment variables.
- Modular & Scalable Architecture: The project is structured for clarity, maintainability, and easy extension with new tools or providers.
The project is organized into two main parts: the FastAPI server (app/
) and the Pipecat voice agent (app/agents/voice/automatic/
).
.
├── app/
│ ├── main.py # FastAPI app, agent endpoint, and subprocess management
│ ├── ws/live_session.py # WebSocket session handling for Gemini Live Proxy
│ ├── services/gemini_service.py # Gemini API interaction logic for proxy
│ ├── api/ # API clients for Juspay, Breeze, etc.
│ │ ├── juspay_metrics.py
│ │ └── breeze_metrics.py
│ ├── tools/ # Tool definitions for the Gemini Live Proxy
│ │ └── ...
│ └── agents/voice/automatic/ # Pipecat Voice Agent
│ ├── __init__.py # Main agent logic, pipeline definition
│ ├── prompts.py # System prompts for the agent
│ └── tools/ # Tool definitions for the agent
│ ├── __init__.py # Dynamic tool initializer
│ ├── system/ # System tools (e.g., get_current_time)
│ ├── dummy/ # Dummy tools for test mode
│ ├── juspay/ # Real-time Juspay analytics tools
│ └── breeze/ # Real-time Breeze analytics tools
├── static/
│ └── client.html # HTML client for testing
├── requirements.txt
└── run.py # Script to run the server
- Python 3.8+
- Access to Google Gemini, Azure OpenAI, and Daily.co APIs with valid keys.
- Clone the repository.
- Create and activate a virtual environment.
- Install dependencies:
pip install -r requirements.txt
- Set up Environment Variables:
Create a
.env
file in the project root with the following variables:DAILY_API_KEY
: Required.AZURE_OPENAI_API_KEY
: Required.AZURE_OPENAI_ENDPOINT
: Required.GOOGLE_CREDENTIALS_JSON
: Required. Path to your Google Cloud credentials JSON file.GEMINI_API_KEY
: Required for the Gemini Live Proxy.
Execute the run.py
script:
python run.py
The server will start on http://0.0.0.0:8000
by default.
- A client sends a POST request to the
/agent/voice/automatic
endpoint on the FastAPI server. - The payload includes the
mode
(live
ortest
) and various tokens/IDs (eulerToken
,breezeToken
,shopId
, etc.). - The server creates a new Daily.co video room for the voice session.
- It then launches the Pipecat voice agent as a new subprocess, passing the mode, tokens, and shop details as command-line arguments.
- Inside the agent's
__init__.py
, theinitialize_tools
function is called. - This function checks the
mode
and the presence of tokens to decide which toolsets to load:- System tools are always loaded.
- In
test
mode, dummy tools are loaded. - In
live
mode, if tokens are present, the corresponding real-time Juspay and Breeze tools are loaded.
- The agent's system prompt is personalized with the user's name if provided.
- The agent connects to the Daily room and begins the conversation, now equipped with the appropriate set of tools for the session.
This architecture allows for clean separation of concerns and enables the creation of highly contextual and capable voice assistants.