Movie Recommender - Conversational AI with RAG

Conversational AI agent that collects user preferences and provides personalized movie recommendations using Retrieval-Augmented Generation (RAG).

🎬 What This Project Does

This movie recommendation system orchestrates multiple specialized AI agents that work together to understand user preferences and deliver personalized movie suggestions.

At its core, a Conversation Orchestrator coordinates four distinct agents: the ExtractorAgent analyzes user input to extract structured preferences (genres, keywords, sentiment) using Pydantic schemas; the RequesterAgent generates contextual follow-up questions when information is incomplete; the RecommenderAgent performs hybrid RAG search combining semantic similarity with genre filtering to retrieve relevant movies; and the SummarizerAgent summarizes the conversation.

When the ExtractorAgent detects negative sentiment (frustration, impatience), the orchestrator immediately skips further questioning and proceeds directly to recommendations, ensuring a responsive user experience.

The system maintains conversation state across multiple turns, accumulating preferences incrementally while providing graceful fallbacks at every layer from LLM failures and database errors.

Its powered by local Ollama models, gpt-oss has been chosen for most of the agents but llama3.1 and llama3.2 have also been tested (with worse results).

🏗️ Architecture

Component Breakdown

1. Agent Layer (`src/agents/`)

Modular agents that handle specific conversation tasks with standardized interfaces:

base.py - Abstract agent protocol and shared infrastructure
- Agent: Abstract base class with execute() method
- AgentResponse: Standardized response wrapper with error handling
- AgentErrorType: Enum for error categorization
extractor.py - ExtractorAgent
- Extracts structured data using Instructor + Pydantic
- Returns: ExtractedInfo (genres, preferences, sentiment)
- Chosen model: gpt-oss:20b @ temp 0.0 - note: focusing on accurate results to avoid extra turns
- Uses MD_JSON mode for Ollama compatibility
- Fallback: Empty model
requester.py - RequesterAgent
- Generates contextual follow-up questions
- Analyzes conversation history and missing information
- Chosen model: llama3.2:3b @ temp 0.5 - note: it does the good at a better speed
- Fallback: Generic question if LLM fails
recommender.py - RecommenderAgent
- Retrieves movies via RAG semantic search
- Formats recommendations with natural language
- Chosen model: gpt-oss:20b @ temp 0.5 - giving a good and personalised recommendation to the user
- Fallback: Plain text list if LLM fails
summarizer.py - SummarizerAgent
- Generates conversation summaries for storage
- Runs once at end of conversation
- Chosen model: gpt-oss:20b @ temp 0.5 - here we don't care about speed cause this part has the potential to be handled asynchronously
- Returns: String summary or None on error
orchestrator.py - ConversationOrchestrator
- Coordinates agent execution flow
- Manages conversation loop (max 20 turns by default)
- Handles voice/text input switching
- Saves conversation state to JSON

2. Core Layer (`src/core/`)

Foundational data structures and configuration:

models.py - Pydantic data models with validation
- Message: Conversation history entries (role, content)
- ExtractedInfo: User preferences (genres, preferences, sentiment)
- Movie: Movie metadata (title, year, rating, genres, overview)
- Genre: Enum of 19 TMDB genres
- Sentiment: Enum (positive, neutral, negative)
- Role: Enum (user, assistant, system)
state.py - Conversation state management
- State: Maintains conversation history and extracted info
- Accumulates data across multiple turns
- Provides JSON serialization for persistence
config.py - Centralized configuration with environment variables
- Model selection (EXTRACTION_MODEL, REQUESTER_MODEL, etc.)
- Temperature settings per agent
- Ollama API configuration
- ChromaDB settings

3. Service Layer (`src/services/`)

External service integrations and client management:

llm_client.py - LLM client singleton management
database.py - ChromaDB persistent client
listener.py - Voice input using Whisper and webrtcvad for VAD
speaker.py - Voice output using pyttsx3 TTS

4. RAG Layer (`src/rag/`)

Semantic search and vector database operations:

retriever.py - Hybrid semantic search
- retrieve_movies(): Main retrieval function
- Generates query embeddings using embeddinggemma
- Applies genre metadata filtering
- Returns top-N results sorted by similarity
indexer.py - Dataset indexing pipeline
- index_movies(): Batch embedding generation
- Reads from data/movies.csv
- Creates boolean genre fields for filtering (no support for lists)

5. Prompt Layer (`src/prompts/`)

Specialized system prompts for each agent:

extractor.py - Structured extraction prompts
requester.py - Question generation prompts
recommender.py - Movie presentation prompts
summarizer.py - Conversation summary prompts

🔄 System Flow

Complete Conversation Flow

flowchart TD
    Start([User Input]) --> Extract[1. Extractor Agent<br/>Extract genres<br/>Extract preferences<br/>Detect sentiment]
    Extract --> Update[2. Update State<br/>Merge new info<br/>Maintain history]
    Update --> Check{3. Check Completeness<br/>Have genres OR prefs?<br/>Negative sentiment?}
    
    Check -->|YES| Recommend[4a. Recommender<br/>RAG Search<br/>Format Reply]
    Recommend --> Show[5a. Show Movies]
    Show --> Summarize[6. Summarizer<br/>Generate Summary]
    Summarize --> Save[7. Save State<br/>JSON + Summary<br/>END]
    Save --> End([End])
    
    Check -->|NO| Question[4b. Requester<br/>Ask Question]
    Question --> Loop[5b. Loop Back<br/>Get User Input]
    Loop --> Start

RAG Retrieval Flow

flowchart TD
    Prefs[User Preferences<br/>Genres: Action<br/>Prefs: fast-paced] --> Query[Build Search Query<br/>fast-paced]
    Query --> Embed[Generate Query Embedding<br/>Ollama embeddinggemma]
    Embed --> Filter[Build Genre Filter<br/>genre_Action: true]
    Filter --> ChromaDB[ChromaDB Query<br/>Semantic + Filter]
    ChromaDB --> Results[Top 5 Movies<br/>Sorted by similarity]

🚀 Setup Instructions

Prerequisites

Before you begin, ensure you have the following installed:

Python 3.10+

python --version  # Should be 3.10 or higher

Ollama - Local LLM inference engine

# Install from https://ollama.ai/
# Or via Homebrew on macOS:
brew install ollama

# Verify installation
ollama --version

System Dependencies (for voice mode - optional)

# macOS
brew install portaudio ffmpeg

# Ubuntu/Debian
sudo apt-get install portaudio19-dev ffmpeg

Installation Steps

1. Clone Repository & Create Environment

# Clone the repository
git clone <repository-url>
cd movie-recommender

# Create virtual environment
python -m venv .venv

# Activate virtual environment
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

2. Install Python Dependencies

# Install all required packages
pip install -r requirements.txt

3. Install Ollama Models

# Pull the recommended models (one-time setup)

# This one can be used for every agent
ollama pull gpt-oss:20b

# (Optional) It can be used for the RequesterAgent to speed up things a bit
ollama pull llama3.2:3b

# Used as the embedding model for semantic search
ollama pull embeddinggemma

# Verify models are installed
ollama list

4. Configure Environment (Optional)

# Copy example configuration and edit to customize models and temperatures per agent.
cp .env.example .env

Default Configuration (already optimized):

# Models
EXTRACTION_MODEL=gpt-oss:20b
REQUESTER_MODEL=llama3.2:3b
RECOMMENDER_MODEL=gpt-oss:20b
SUMMARIZER_MODEL=gpt-oss:20b
EMBEDDING_MODEL=embeddinggemma

# Temperatures
EXTRACTION_TEMPERATURE=0.0
REQUESTER_TEMPERATURE=0.5
RECOMMENDER_TEMPERATURE=0.5
SUMMARIZER_TEMPERATURE=0.5

5. Index the Movie Database

# Index the dataset (1,000 movies from data/movies.csv)
python -m scripts.index_dataset

# Expected output:
# Processing movies... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:01:23
# ✅ Successfully indexed 1000 movies to ChromaDB collection 'movies'

What this does:

Generates embeddings for each movie's overview
Stores vectors + metadata in ChromaDB (./chroma_db/)
Takes around a minute

Pro Tip: You can interrupt with Ctrl+C to test with a smaller subset first.

Start a conversation

Run a quick test to ensure everything is working:

python -m scripts.run

Voice Mode

python -m scripts.run --voice

🎤 Speak your responses instead of typing
🔊 Hear the agent's messages
First run: Whisper downloads ~140MB model automatically

Verbose Logging

python -m scripts.run --verbose

🧪 Testing

Run All Tests

pytest tests

Conversation Transcripts

At the end of each conversation, a conversation transcript is stored along with a summary to the ./conversations folder:

./conversations/conversation_YYYYMMDD_HHMMSS.json

A few examples are included in the ./conversations/examples folder:

Complete information provided upfront: 1-information-upfront.json
Information provided after a follow-up: 2-information-follow-up.json
Impatience detected and movie recommendation provided without information collected: 3-without-information-due-to-impatience.json
Some off-topic questions: 4-off-topic.json

🎯 Key Design Decisions

One agent per task: Modular agent architecture allows fine-tuning models, temperatures and prompts for every specific use case.
Protocol-based design: All agents implement a common Agent protocol with standardized AgentResponse, enabling easy swapping and testing.
Model selection strategy: Different models for different tasks (gpt-oss for accuracy, llama3.2 for speed), optimized through experimentation.
Structured outputs with Pydantic: Using Instructor library with Pydantic models ensures type-safe, validated extractions.
Voice-aware prompts: Dynamically adjusted formatting based on output mode (text vs. speech) for better UX.
Sentiment detection: Detects user frustration to skip additional questions and provide immediate recommendations.
Flexible extraction logic: Accepts either genres OR preference descriptions, lowering the barrier for users to get results.
Hybrid search: Combines semantic similarity (embeddings) with metadata filtering (genre tags) for more accurate retrieval.
Embedding model choice: Using embeddinggemma as from investigations seems to be a great open source model for embeddings and provided good results.
Graceful degradation: Fallback messages for every agent ensure a message is always sent to the user no matter what happens.
Multi-turn conversation flow: Turns are configurable, 20 by default.
JSON persistence: Conversations and their summary are being stored as JSON at the end for simplicity.
Edge case handling: Manages empty inputs, off-topic responses, and varying levels of user detail.

⏩ Potential Improvements

Testing & Quality Assurance

Enhanced test coverage: More comprehensive e2e flow testing and edge case scenarios.
LLM evaluation framework: Implement automated evaluation of agent responses for quality and accuracy.
Performance benchmarking: Track response times, accuracy metrics, and user satisfaction scores.
Integration tests with real LLMs: Currently mocked in tests; real model integration tests would catch prompt regressions.

Model & Prompt Optimization

Model experimentation: Only tried 3 models; could explore more specialized models for each task.
Prompt versioning: Track prompt changes and A/B test different phrasings for better results.
Temperature tuning: More granular temperature optimization per use case.

RAG & Retrieval Enhancements

Adding more filters: Rating could be used to weight results.
User feedback loop: Learn from user reactions to improve future recommendations.
Improve default search query: Right now it just returns any movies, we could create a default query based on rating.
Caching: Store frequent queries and embeddings to reduce latency and API calls.

Scalability & Production Readiness

Asynchronous processing: Make summarization and non-critical operations async to improve response times.
Real-time conversation updates: Stream conversation state to storage after each turn, not just at the end.
Rate limiting & quotas: Protect against abuse and manage token usage per user/session.
Containerization: Docker setup for consistent deployment across environments.

Security & Guardrails

Input sanitization: More robust validation and sanitization of user inputs.
Content filtering: Detect and handle inappropriate or off-topic requests more strictly.
PII detection: Identify and redact personally identifiable information.

UX & Accessibility

Streaming responses: Stream LLM responses token-by-token for better perceived performance.
Progress indicators: Show when the system is thinking/searching for better transparency.
Multi-language support: Extend beyond English.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
conversations/examples		conversations/examples
data		data
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Uh oh!

Uh oh!

carlosfdev/ai-movie-recommender

Folders and files

Latest commit

History

Repository files navigation

Movie Recommender - Conversational AI with RAG

🎬 What This Project Does

📋 Table of Contents

🏗️ Architecture

Component Breakdown

1. Agent Layer (src/agents/)

2. Core Layer (src/core/)

3. Service Layer (src/services/)

4. RAG Layer (src/rag/)

5. Prompt Layer (src/prompts/)

🔄 System Flow

Complete Conversation Flow

RAG Retrieval Flow

🚀 Setup Instructions

Prerequisites

Installation Steps

1. Clone Repository & Create Environment

2. Install Python Dependencies

3. Install Ollama Models

4. Configure Environment (Optional)

5. Index the Movie Database

Start a conversation

Voice Mode

Verbose Logging

🧪 Testing

Run All Tests

Conversation Transcripts

🎯 Key Design Decisions

⏩ Potential Improvements

Testing & Quality Assurance

Model & Prompt Optimization

RAG & Retrieval Enhancements

Scalability & Production Readiness

Security & Guardrails

UX & Accessibility

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Agent Layer (`src/agents/`)

2. Core Layer (`src/core/`)

3. Service Layer (`src/services/`)

4. RAG Layer (`src/rag/`)

5. Prompt Layer (`src/prompts/`)

Packages