Skip to content

kaenova/azure-function-semantic-kernel

Repository files navigation

Azure Function Semantic Kernel - Multi-Agent System

A comprehensive AI assistant platform built with Azure Functions and FastAPI, featuring multiple AI agents, document processing, real-time speech recognition, and a modern web interface.

πŸš€ Features

  • πŸ€– Multi-Agent AI System: Single Agent, Multi-Agent (Triage), and Hands-Off Agent
  • πŸ“„ Document Upload & Processing: PDF, DOCX, TXT with AI Search integration
  • 🎀 Real-Time Speech Recognition: WebRTC-based voice input with WebSocket streaming
  • πŸ’¬ Modern Web Interface: Responsive chatbot UI with separate chat rooms
  • πŸ”„ Dual Deployment: Both Azure Functions and FastAPI implementations
  • πŸ“Š Chat History Management: Export/import with compression support
  • πŸ”§ Agent State Management: Persistent state across sessions

πŸ› οΈ Technology Stack

  • Backend: Python 3.12, FastAPI, Azure Functions
  • AI Framework: Microsoft Semantic Kernel
  • Speech Services: Azure Speech Services with WebRTC
  • Document Processing: Azure Document Intelligence + AI Search
  • Frontend: HTML5, CSS3, JavaScript (WebRTC, WebSocket)
  • Database: Azure AI Search for document indexing

πŸ“‹ Prerequisites

  • Python 3.12+
  • Azure Account with the following services:
    • Azure OpenAI Service
    • Azure Speech Services
    • Azure Document Intelligence
    • Azure AI Search
  • Node.js (for Azure Functions development)

βš™οΈ Setup & Installation

1. Clone the Repository

git clone <repository-url>
cd azure-function-semantic-kernel

2. Create Virtual Environment

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Environment Configuration

Copy .env.example to .env and configure your Azure services:

# Container Environment (to deactivate some features)
CONTAINER_ENV=0 # "0"/"1" value

# Partner Name for Footer
PARTNER_NAME=Your Company Name

# Azure OpenAI
OPENAI_KEY=your_openai_key
OPENAI_ENDPOINT=https://your-resource.openai.azure.com/

# Azure Speech Services
SPEECH_KEY=your_speech_key
SPEECH_ENDPOINT=https://your-region.api.cognitive.microsoft.com/

# Azure Document Intelligence
DOCUMENT_INTELLIGENCE_KEY=your_doc_intelligence_key
DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-resource.cognitiveservices.azure.com/

# Azure AI Search
AI_SEARCH_KEY=your_search_key
AI_SEARCH_ENDPOINT=https://your-search-service.search.windows.net
AI_SEARCH_INDEX=documents

# Azure Blob Storage
BLOB_STORAGE_CONNECTION_STRING=your_blob_storage_connection_string
BLOB_CONTAINER_NAME=your_container_name

# Azure AI Foundry (if applicable)
FOUNDRY_ENDPOINT=https://your-foundry-endpoint
FOUNDRY_AGENT_ID=registered_agent_id

πŸš€ Running the Application

Option 1: FastAPI Development Server (Recommended)

# Start FastAPI server
uvicorn fastapi_app:app --host 0.0.0.0 --port 8000 --reload

Access Points:

Option 2: Azure Functions Local Development

# Start Azure Functions runtime
func start

Access Points:

πŸ–₯️ Web Interface Features

πŸ€– Multi-Agent Chat System

  • πŸ“š Single Agent: General-purpose assistant with tool integration
  • 🎯 Multi-Agent (Triage): Intelligent request routing to specialized agents
  • πŸš€ Hands-Off Agent: Autonomous workflow execution

πŸ’¬ Chat Interface

  • Separate Chat Rooms: Each agent maintains its own conversation history
  • Message Count Indicators: Visual badges showing activity per agent
  • Real-time Messaging: Instant responses with typing indicators
  • Chat History: Persistent conversation storage per agent

πŸ“„ Document Upload

  • Drag & Drop: Intuitive file upload interface
  • Supported Formats: PDF, DOCX, TXT files
  • Progress Tracking: Real-time upload status
  • AI Search Integration: Automatic document indexing for agent queries

🎀 Voice Input

  • WebRTC Speech Recognition: High-quality audio capture
  • Multi-language Support: English, Indonesian, French, German, Spanish
  • Real-time Transcription: Live speech-to-text conversion
  • Audio Visualization: Visual feedback during recording
  • WebSocket Streaming: Low-latency audio processing

πŸ”§ Technical Architecture

πŸ—οΈ System Components

1. Agent System

# Agent Types Available
single_agent/         # General-purpose AI assistant
β”œβ”€β”€ agent.py         # Main agent implementation
β”œβ”€β”€ prompt.py        # System prompts and instructions
└── plugins/         # Tool integrations (lights control, etc.)

multi_agent/         # Intelligent request triage system
β”œβ”€β”€ agent.py         # Orchestrator with specialized sub-agents
β”œβ”€β”€ agents/          # Specialized agent implementations
β”‚   β”œβ”€β”€ document_agent/    # PDF/document processing specialist
β”‚   β”œβ”€β”€ light_agent/       # IoT/smart home automation
β”‚   └── orchestrator_agent/ # Request routing and coordination
└── common.py        # Shared utilities and configurations

hands_off_agent/     # Autonomous workflow execution
β”œβ”€β”€ agent.py         # Advanced orchestration with minimal human input
β”œβ”€β”€ agents/          # Same specialized agents as multi-agent
└── common.py        # Enhanced error handling and recovery

2. Document Processing Pipeline

document_upload_cli/
β”œβ”€β”€ utils.py         # Document parsing and chunking
└── __init__.py      # CLI interface for bulk uploads

# Document Flow:
# File Upload β†’ Document Intelligence β†’ Text Extraction β†’ 
# Chunking β†’ Embedding β†’ Azure AI Search Index β†’ Agent Queries

3. Speech Recognition System

utils/fastapi/
└── azure_speech_streaming.py  # WebRTC + Azure Speech integration

# Speech Flow:
# Microphone β†’ WebRTC Capture β†’ WebSocket β†’ Azure Speech β†’ 
# Real-time Transcription β†’ Agent Processing

🌐 API Endpoints

FastAPI Endpoints

Core Endpoints:

GET  /                          # Main chatbot interface
GET  /chatbot                   # Alternative chatbot access
GET  /docs                      # Interactive API documentation

Agent Interactions:

POST /chat/single               # Single agent conversation
POST /chat/multi                # Multi-agent triage system
POST /chat/hands-off            # Hands-off agent workflow

# Example Request:
{
    "message": "What documents do we have about machine learning?",
    "session_id": "user123",
    "context": {}
}

Document Management:

POST /upload/document           # Upload and process documents
GET  /documents/search          # Search processed documents
POST /documents/batch-upload    # Bulk document processing

# Upload Example:
curl -X POST "http://localhost:8000/upload/document" \
     -H "Content-Type: multipart/form-data" \
     -F "[email protected]"

Speech Recognition:

GET  /speech/test-webrtc-ui     # Speech recognition test interface
WS   /speech/websocket          # WebSocket for real-time speech

# WebSocket Connection:
const ws = new WebSocket('ws://localhost:8000/speech/websocket');

Utility Endpoints:

GET  /health                    # System health check
GET  /status                    # Detailed system status
POST /chat/history/export       # Export chat history
POST /chat/history/import       # Import chat history

Azure Functions Legacy Endpoints

The following endpoints are maintained for backward compatibility:

πŸ”„ WebSocket Features

πŸ’“ Ping/Pong Keepalive System

The application implements automatic connection health monitoring:

// Client-side ping management
function startPingInterval() {
    pingInterval = setInterval(() => {
        if (ws && ws.readyState === WebSocket.OPEN) {
            ws.send(JSON.stringify({ type: 'ping' }));
        }
    }, 2000); // Ping every 2 seconds
}

// Server automatically responds with pong

Features:

  • Automatic Reconnection: Client automatically reconnects on connection loss
  • Health Monitoring: Visual indicators for connection status
  • Timeout Management: Configurable timeout periods for robust operation
  • Background Keepalive: Maintains connections during idle periods

πŸ“‘ Real-time Communication

// WebSocket message handling
ws.onmessage = function(event) {
    const data = JSON.parse(event.data);
    
    switch(data.type) {
        case 'chat_response':
            displayMessage(data.message, 'ai');
            break;
        case 'transcription':
            updateTranscription(data.text);
            break;
        case 'pong':
            // Connection health confirmed
            break;
    }
};

🎨 Modern UI Design

🎯 Design Philosophy

  • Neutral Color Palette: Professional gray tones for reduced eye strain
  • Responsive Layout: Mobile-first design with desktop optimization
  • Accessibility: WCAG 2.1 compliant with keyboard navigation
  • Modern UX: Smooth animations and intuitive interactions

🎨 Color Scheme

:root {
    --bg-color: #1a1a1a;           /* Deep charcoal background */
    --card-bg: #2a2a2a;           /* Card/panel backgrounds */
    --accent-color: #4a90e2;       /* Primary accent blue */
    --text-primary: #e0e0e0;       /* Primary text color */
    --text-secondary: #b0b0b0;     /* Secondary text color */
    --border-color: #404040;       /* Subtle borders */
    --success-color: #4caf50;      /* Success/positive actions */
    --warning-color: #ff9800;      /* Warnings/attention */
}

🧩 UI Components

  • Chat Bubbles: Distinct styling for user/AI messages
  • Agent Switcher: Tabbed interface with activity indicators
  • File Upload: Drag-and-drop with progress visualization
  • Voice Controls: Recording status with audio visualization
  • Status Indicators: Connection health and agent status

🚦 Error Handling & Recovery

πŸ›‘οΈ Robust Error Management

The system implements comprehensive error handling across all components:

# Agent Error Recovery Example
async def chat_loop(self, message: str, session_id: str):
    try:
        # Main agent processing
        response = await self.process_message(message)
        return response
    except ValidationError as e:
        logger.error(f"Validation error in agent: {e}")
        # Automatic agent restart
        await self.restart_agent()
        return {"error": "Agent restarted due to validation error"}
    except Exception as e:
        logger.error(f"Unexpected error: {e}")
        return {"error": "An unexpected error occurred"}

Error Recovery Features:

  • Automatic Agent Restart: Agents restart on critical errors
  • Graceful Degradation: Fallback responses when services are unavailable
  • Connection Recovery: WebSocket auto-reconnection with exponential backoff
  • Timeout Management: Configurable timeouts prevent hanging operations
  • Comprehensive Logging: Detailed error tracking for debugging

πŸ“Š Azure Functions API Reference (Legacy)

The following endpoints are maintained for backward compatibility with existing integrations.

πŸ”§ Single Agent (Legacy Azure Functions)

Chat with LLM: Integrated tools available (refer to single_agent/plugins.py):

  • Lamp Management: Turn on/off lamps, get lamp IDs, search lamp details by name
  • Chat History: Retrieve conversation history
$ curl --get --data-urlencode "chat=<your message>" http://localhost:7071/api/single/chat

Check history

$ curl http://localhost:7071/api/single/history

Export chat

$ curl http://localhost:7071/api/single/history/export

Import chat

$ curl -X POST -d '{"data":"<your base64 data>"}' http://localhost:7071/api/single/history/import

Export chat (compressed)

$ curl http://localhost:7071/api/single/history/export/compress

Import chat (compressed)

$ curl -X POST -d '{"data":"<your base64 data>"}' http://localhost:7071/api/single/history/import/compress

🎯 Multi-Agent System (Legacy Azure Functions)

Start the session

This will start a session to the multi-agent system.

$ curl -X POST -d '{"chat":"<your message>"}' http://localhost:7071/api/multi/chat/start

It will hang for waiting to return a response. But, it will not return until you finish the agent. You can open new terminal for contributing into the chat

Send a message to the session

This will send a message to the multi-agent system.

$ curl -X POST -d '{"chat":"<your message>"}' http://localhost:7071/api/multi/chat

Check history

$ curl http://localhost:7071/api/multi/history

Export chat

$ curl http://localhost:7071/api/multi/history/export

Import chat

$ curl -X POST -d '{"data":"<your base64 data>"}' http://localhost:7071/api/multi/history/import

Export chat (compressed)

$ curl http://localhost:7071/api/multi/history/export/compress

Import chat (compressed)

$ curl -X POST -d '{"data":"<your base64 data>"}' http://localhost:7071/api/multi/history/import/compress

Export state

$ curl http://localhost:7071/api/multi/state/export

Import state

$ curl -X POST --header "Content-Type: application/json" -d '<your state data>' http://localhost:7071/api/multi/state/import

Export state (compressed)

$ curl http://localhost:7071/api/multi/state/export/compress

Import state (compressed)

$ curl -X POST -d '{"data":"<your base64 data>"}' http://localhost:7071/api/multi/state/import/compress

πŸ“¦ Additional Resources

πŸš€ Quick Start Guide

  1. Clone β†’ Install Dependencies β†’ Configure Environment β†’ Run FastAPI
  2. Open Browser β†’ http://localhost:8000 β†’ Start Chatting
  3. Upload Documents β†’ Try Voice Input β†’ Switch Between Agents

πŸ”§ Development Tips

  • Hot Reload: FastAPI automatically reloads on code changes
  • Debug Mode: Set DEBUG=True in environment for detailed logging
  • API Testing: Use http://localhost:8000/docs for interactive API testing
  • WebSocket Testing: Use browser developer tools to monitor WebSocket traffic

πŸ“š Further Reading

πŸ†˜ Getting Help

  • Issues: GitHub repository issues page
  • Discord: Community discussion channels
  • Stack Overflow: Tag questions with semantic-kernel and fastapi
  • Azure Support: For Azure service-specific issues

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Microsoft Semantic Kernel: AI orchestration framework
  • Azure AI Services: Cognitive services integration
  • FastAPI: Modern Python web framework
  • WebRTC: Real-time communication standards
  • Open Source Community: Libraries and tools that make this possible

πŸš€ Ready to explore AI-powered conversations with multiple agents, document intelligence, and real-time speech recognition!

Built with ❀️ using Microsoft Semantic Kernel, Azure AI Services, and modern web technologies.

About

Trying out an azure App Function and Semantic Kernel with Custom Plugin!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published