A comprehensive AI assistant platform built with Azure Functions and FastAPI, featuring multiple AI agents, document processing, real-time speech recognition, and a modern web interface.
- π€ Multi-Agent AI System: Single Agent, Multi-Agent (Triage), and Hands-Off Agent
- π Document Upload & Processing: PDF, DOCX, TXT with AI Search integration
- π€ Real-Time Speech Recognition: WebRTC-based voice input with WebSocket streaming
- π¬ Modern Web Interface: Responsive chatbot UI with separate chat rooms
- π Dual Deployment: Both Azure Functions and FastAPI implementations
- π Chat History Management: Export/import with compression support
- π§ Agent State Management: Persistent state across sessions
- Backend: Python 3.12, FastAPI, Azure Functions
- AI Framework: Microsoft Semantic Kernel
- Speech Services: Azure Speech Services with WebRTC
- Document Processing: Azure Document Intelligence + AI Search
- Frontend: HTML5, CSS3, JavaScript (WebRTC, WebSocket)
- Database: Azure AI Search for document indexing
- Python 3.12+
- Azure Account with the following services:
- Azure OpenAI Service
- Azure Speech Services
- Azure Document Intelligence
- Azure AI Search
- Node.js (for Azure Functions development)
git clone <repository-url>
cd azure-function-semantic-kernel
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt
Copy .env.example
to .env
and configure your Azure services:
# Container Environment (to deactivate some features)
CONTAINER_ENV=0 # "0"/"1" value
# Partner Name for Footer
PARTNER_NAME=Your Company Name
# Azure OpenAI
OPENAI_KEY=your_openai_key
OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
# Azure Speech Services
SPEECH_KEY=your_speech_key
SPEECH_ENDPOINT=https://your-region.api.cognitive.microsoft.com/
# Azure Document Intelligence
DOCUMENT_INTELLIGENCE_KEY=your_doc_intelligence_key
DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-resource.cognitiveservices.azure.com/
# Azure AI Search
AI_SEARCH_KEY=your_search_key
AI_SEARCH_ENDPOINT=https://your-search-service.search.windows.net
AI_SEARCH_INDEX=documents
# Azure Blob Storage
BLOB_STORAGE_CONNECTION_STRING=your_blob_storage_connection_string
BLOB_CONTAINER_NAME=your_container_name
# Azure AI Foundry (if applicable)
FOUNDRY_ENDPOINT=https://your-foundry-endpoint
FOUNDRY_AGENT_ID=registered_agent_id
# Start FastAPI server
uvicorn fastapi_app:app --host 0.0.0.0 --port 8000 --reload
Access Points:
- Web Interface: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Speech Test UI: http://localhost:8000/speech/test-webrtc-ui
# Start Azure Functions runtime
func start
Access Points:
- API Base: http://localhost:7071/api
- Function endpoints: http://localhost:7071/api/{endpoint}
- π Single Agent: General-purpose assistant with tool integration
- π― Multi-Agent (Triage): Intelligent request routing to specialized agents
- π Hands-Off Agent: Autonomous workflow execution
- Separate Chat Rooms: Each agent maintains its own conversation history
- Message Count Indicators: Visual badges showing activity per agent
- Real-time Messaging: Instant responses with typing indicators
- Chat History: Persistent conversation storage per agent
- Drag & Drop: Intuitive file upload interface
- Supported Formats: PDF, DOCX, TXT files
- Progress Tracking: Real-time upload status
- AI Search Integration: Automatic document indexing for agent queries
- WebRTC Speech Recognition: High-quality audio capture
- Multi-language Support: English, Indonesian, French, German, Spanish
- Real-time Transcription: Live speech-to-text conversion
- Audio Visualization: Visual feedback during recording
- WebSocket Streaming: Low-latency audio processing
# Agent Types Available
single_agent/ # General-purpose AI assistant
βββ agent.py # Main agent implementation
βββ prompt.py # System prompts and instructions
βββ plugins/ # Tool integrations (lights control, etc.)
multi_agent/ # Intelligent request triage system
βββ agent.py # Orchestrator with specialized sub-agents
βββ agents/ # Specialized agent implementations
β βββ document_agent/ # PDF/document processing specialist
β βββ light_agent/ # IoT/smart home automation
β βββ orchestrator_agent/ # Request routing and coordination
βββ common.py # Shared utilities and configurations
hands_off_agent/ # Autonomous workflow execution
βββ agent.py # Advanced orchestration with minimal human input
βββ agents/ # Same specialized agents as multi-agent
βββ common.py # Enhanced error handling and recovery
document_upload_cli/
βββ utils.py # Document parsing and chunking
βββ __init__.py # CLI interface for bulk uploads
# Document Flow:
# File Upload β Document Intelligence β Text Extraction β
# Chunking β Embedding β Azure AI Search Index β Agent Queries
utils/fastapi/
βββ azure_speech_streaming.py # WebRTC + Azure Speech integration
# Speech Flow:
# Microphone β WebRTC Capture β WebSocket β Azure Speech β
# Real-time Transcription β Agent Processing
Core Endpoints:
GET / # Main chatbot interface
GET /chatbot # Alternative chatbot access
GET /docs # Interactive API documentation
Agent Interactions:
POST /chat/single # Single agent conversation
POST /chat/multi # Multi-agent triage system
POST /chat/hands-off # Hands-off agent workflow
# Example Request:
{
"message": "What documents do we have about machine learning?",
"session_id": "user123",
"context": {}
}
Document Management:
POST /upload/document # Upload and process documents
GET /documents/search # Search processed documents
POST /documents/batch-upload # Bulk document processing
# Upload Example:
curl -X POST "http://localhost:8000/upload/document" \
-H "Content-Type: multipart/form-data" \
-F "[email protected]"
Speech Recognition:
GET /speech/test-webrtc-ui # Speech recognition test interface
WS /speech/websocket # WebSocket for real-time speech
# WebSocket Connection:
const ws = new WebSocket('ws://localhost:8000/speech/websocket');
Utility Endpoints:
GET /health # System health check
GET /status # Detailed system status
POST /chat/history/export # Export chat history
POST /chat/history/import # Import chat history
The following endpoints are maintained for backward compatibility:
The application implements automatic connection health monitoring:
// Client-side ping management
function startPingInterval() {
pingInterval = setInterval(() => {
if (ws && ws.readyState === WebSocket.OPEN) {
ws.send(JSON.stringify({ type: 'ping' }));
}
}, 2000); // Ping every 2 seconds
}
// Server automatically responds with pong
Features:
- Automatic Reconnection: Client automatically reconnects on connection loss
- Health Monitoring: Visual indicators for connection status
- Timeout Management: Configurable timeout periods for robust operation
- Background Keepalive: Maintains connections during idle periods
// WebSocket message handling
ws.onmessage = function(event) {
const data = JSON.parse(event.data);
switch(data.type) {
case 'chat_response':
displayMessage(data.message, 'ai');
break;
case 'transcription':
updateTranscription(data.text);
break;
case 'pong':
// Connection health confirmed
break;
}
};
- Neutral Color Palette: Professional gray tones for reduced eye strain
- Responsive Layout: Mobile-first design with desktop optimization
- Accessibility: WCAG 2.1 compliant with keyboard navigation
- Modern UX: Smooth animations and intuitive interactions
:root {
--bg-color: #1a1a1a; /* Deep charcoal background */
--card-bg: #2a2a2a; /* Card/panel backgrounds */
--accent-color: #4a90e2; /* Primary accent blue */
--text-primary: #e0e0e0; /* Primary text color */
--text-secondary: #b0b0b0; /* Secondary text color */
--border-color: #404040; /* Subtle borders */
--success-color: #4caf50; /* Success/positive actions */
--warning-color: #ff9800; /* Warnings/attention */
}
- Chat Bubbles: Distinct styling for user/AI messages
- Agent Switcher: Tabbed interface with activity indicators
- File Upload: Drag-and-drop with progress visualization
- Voice Controls: Recording status with audio visualization
- Status Indicators: Connection health and agent status
The system implements comprehensive error handling across all components:
# Agent Error Recovery Example
async def chat_loop(self, message: str, session_id: str):
try:
# Main agent processing
response = await self.process_message(message)
return response
except ValidationError as e:
logger.error(f"Validation error in agent: {e}")
# Automatic agent restart
await self.restart_agent()
return {"error": "Agent restarted due to validation error"}
except Exception as e:
logger.error(f"Unexpected error: {e}")
return {"error": "An unexpected error occurred"}
Error Recovery Features:
- Automatic Agent Restart: Agents restart on critical errors
- Graceful Degradation: Fallback responses when services are unavailable
- Connection Recovery: WebSocket auto-reconnection with exponential backoff
- Timeout Management: Configurable timeouts prevent hanging operations
- Comprehensive Logging: Detailed error tracking for debugging
The following endpoints are maintained for backward compatibility with existing integrations.
Chat with LLM:
Integrated tools available (refer to single_agent/plugins.py
):
- Lamp Management: Turn on/off lamps, get lamp IDs, search lamp details by name
- Chat History: Retrieve conversation history
$ curl --get --data-urlencode "chat=<your message>" http://localhost:7071/api/single/chat
$ curl http://localhost:7071/api/single/history
$ curl http://localhost:7071/api/single/history/export
$ curl -X POST -d '{"data":"<your base64 data>"}' http://localhost:7071/api/single/history/import
$ curl http://localhost:7071/api/single/history/export/compress
$ curl -X POST -d '{"data":"<your base64 data>"}' http://localhost:7071/api/single/history/import/compress
This will start a session to the multi-agent system.
$ curl -X POST -d '{"chat":"<your message>"}' http://localhost:7071/api/multi/chat/start
It will hang for waiting to return a response. But, it will not return until you finish the agent. You can open new terminal for contributing into the chat
This will send a message to the multi-agent system.
$ curl -X POST -d '{"chat":"<your message>"}' http://localhost:7071/api/multi/chat
$ curl http://localhost:7071/api/multi/history
$ curl http://localhost:7071/api/multi/history/export
$ curl -X POST -d '{"data":"<your base64 data>"}' http://localhost:7071/api/multi/history/import
$ curl http://localhost:7071/api/multi/history/export/compress
$ curl -X POST -d '{"data":"<your base64 data>"}' http://localhost:7071/api/multi/history/import/compress
$ curl http://localhost:7071/api/multi/state/export
$ curl -X POST --header "Content-Type: application/json" -d '<your state data>' http://localhost:7071/api/multi/state/import
$ curl http://localhost:7071/api/multi/state/export/compress
$ curl -X POST -d '{"data":"<your base64 data>"}' http://localhost:7071/api/multi/state/import/compress
- Clone β Install Dependencies β Configure Environment β Run FastAPI
- Open Browser β http://localhost:8000 β Start Chatting
- Upload Documents β Try Voice Input β Switch Between Agents
- Hot Reload: FastAPI automatically reloads on code changes
- Debug Mode: Set
DEBUG=True
in environment for detailed logging - API Testing: Use http://localhost:8000/docs for interactive API testing
- WebSocket Testing: Use browser developer tools to monitor WebSocket traffic
- Microsoft Semantic Kernel Documentation
- Azure OpenAI Service
- FastAPI Documentation
- WebRTC Specification
- Issues: GitHub repository issues page
- Discord: Community discussion channels
- Stack Overflow: Tag questions with
semantic-kernel
andfastapi
- Azure Support: For Azure service-specific issues
This project is licensed under the MIT License - see the LICENSE file for details.
- Microsoft Semantic Kernel: AI orchestration framework
- Azure AI Services: Cognitive services integration
- FastAPI: Modern Python web framework
- WebRTC: Real-time communication standards
- Open Source Community: Libraries and tools that make this possible
π Ready to explore AI-powered conversations with multiple agents, document intelligence, and real-time speech recognition!
Built with β€οΈ using Microsoft Semantic Kernel, Azure AI Services, and modern web technologies.