Skip to content

Conversational AI agents that collect user preferences and provide personalized movie recommendations using RAG

carlosfdev/ai-movie-recommender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Movie Recommender - Conversational AI with RAG

Conversational AI agent that collects user preferences and provides personalized movie recommendations using Retrieval-Augmented Generation (RAG).

🎬 What This Project Does

This movie recommendation system orchestrates multiple specialized AI agents that work together to understand user preferences and deliver personalized movie suggestions.

At its core, a Conversation Orchestrator coordinates four distinct agents: the ExtractorAgent analyzes user input to extract structured preferences (genres, keywords, sentiment) using Pydantic schemas; the RequesterAgent generates contextual follow-up questions when information is incomplete; the RecommenderAgent performs hybrid RAG search combining semantic similarity with genre filtering to retrieve relevant movies; and the SummarizerAgent summarizes the conversation.

When the ExtractorAgent detects negative sentiment (frustration, impatience), the orchestrator immediately skips further questioning and proceeds directly to recommendations, ensuring a responsive user experience.

The system maintains conversation state across multiple turns, accumulating preferences incrementally while providing graceful fallbacks at every layer from LLM failures and database errors.

Its powered by local Ollama models, gpt-oss has been chosen for most of the agents but llama3.1 and llama3.2 have also been tested (with worse results).

πŸ“‹ Table of Contents


πŸ—οΈ Architecture

Component Breakdown

1. Agent Layer (src/agents/)

Modular agents that handle specific conversation tasks with standardized interfaces:

  • base.py - Abstract agent protocol and shared infrastructure

    • Agent: Abstract base class with execute() method
    • AgentResponse: Standardized response wrapper with error handling
    • AgentErrorType: Enum for error categorization
  • extractor.py - ExtractorAgent

    • Extracts structured data using Instructor + Pydantic
    • Returns: ExtractedInfo (genres, preferences, sentiment)
    • Chosen model: gpt-oss:20b @ temp 0.0 - note: focusing on accurate results to avoid extra turns
    • Uses MD_JSON mode for Ollama compatibility
    • Fallback: Empty model
  • requester.py - RequesterAgent

    • Generates contextual follow-up questions
    • Analyzes conversation history and missing information
    • Chosen model: llama3.2:3b @ temp 0.5 - note: it does the good at a better speed
    • Fallback: Generic question if LLM fails
  • recommender.py - RecommenderAgent

    • Retrieves movies via RAG semantic search
    • Formats recommendations with natural language
    • Chosen model: gpt-oss:20b @ temp 0.5 - giving a good and personalised recommendation to the user
    • Fallback: Plain text list if LLM fails
  • summarizer.py - SummarizerAgent

    • Generates conversation summaries for storage
    • Runs once at end of conversation
    • Chosen model: gpt-oss:20b @ temp 0.5 - here we don't care about speed cause this part has the potential to be handled asynchronously
    • Returns: String summary or None on error
  • orchestrator.py - ConversationOrchestrator

    • Coordinates agent execution flow
    • Manages conversation loop (max 20 turns by default)
    • Handles voice/text input switching
    • Saves conversation state to JSON

2. Core Layer (src/core/)

Foundational data structures and configuration:

  • models.py - Pydantic data models with validation

    • Message: Conversation history entries (role, content)
    • ExtractedInfo: User preferences (genres, preferences, sentiment)
    • Movie: Movie metadata (title, year, rating, genres, overview)
    • Genre: Enum of 19 TMDB genres
    • Sentiment: Enum (positive, neutral, negative)
    • Role: Enum (user, assistant, system)
  • state.py - Conversation state management

    • State: Maintains conversation history and extracted info
    • Accumulates data across multiple turns
    • Provides JSON serialization for persistence
  • config.py - Centralized configuration with environment variables

    • Model selection (EXTRACTION_MODEL, REQUESTER_MODEL, etc.)
    • Temperature settings per agent
    • Ollama API configuration
    • ChromaDB settings

3. Service Layer (src/services/)

External service integrations and client management:

  • llm_client.py - LLM client singleton management

  • database.py - ChromaDB persistent client

  • listener.py - Voice input using Whisper and webrtcvad for VAD

  • speaker.py - Voice output using pyttsx3 TTS


4. RAG Layer (src/rag/)

Semantic search and vector database operations:

  • retriever.py - Hybrid semantic search

    • retrieve_movies(): Main retrieval function
    • Generates query embeddings using embeddinggemma
    • Applies genre metadata filtering
    • Returns top-N results sorted by similarity
  • indexer.py - Dataset indexing pipeline

    • index_movies(): Batch embedding generation
    • Reads from data/movies.csv
    • Creates boolean genre fields for filtering (no support for lists)

5. Prompt Layer (src/prompts/)

Specialized system prompts for each agent:

  • extractor.py - Structured extraction prompts
  • requester.py - Question generation prompts
  • recommender.py - Movie presentation prompts
  • summarizer.py - Conversation summary prompts

πŸ”„ System Flow

Complete Conversation Flow

flowchart TD
    Start([User Input]) --> Extract[1. Extractor Agent<br/>Extract genres<br/>Extract preferences<br/>Detect sentiment]
    Extract --> Update[2. Update State<br/>Merge new info<br/>Maintain history]
    Update --> Check{3. Check Completeness<br/>Have genres OR prefs?<br/>Negative sentiment?}
    
    Check -->|YES| Recommend[4a. Recommender<br/>RAG Search<br/>Format Reply]
    Recommend --> Show[5a. Show Movies]
    Show --> Summarize[6. Summarizer<br/>Generate Summary]
    Summarize --> Save[7. Save State<br/>JSON + Summary<br/>END]
    Save --> End([End])
    
    Check -->|NO| Question[4b. Requester<br/>Ask Question]
    Question --> Loop[5b. Loop Back<br/>Get User Input]
    Loop --> Start
Loading

RAG Retrieval Flow

flowchart TD
    Prefs[User Preferences<br/>Genres: Action<br/>Prefs: fast-paced] --> Query[Build Search Query<br/>fast-paced]
    Query --> Embed[Generate Query Embedding<br/>Ollama embeddinggemma]
    Embed --> Filter[Build Genre Filter<br/>genre_Action: true]
    Filter --> ChromaDB[ChromaDB Query<br/>Semantic + Filter]
    ChromaDB --> Results[Top 5 Movies<br/>Sorted by similarity]
Loading

πŸš€ Setup Instructions

Prerequisites

Before you begin, ensure you have the following installed:

  1. Python 3.10+

    python --version  # Should be 3.10 or higher
  2. Ollama - Local LLM inference engine

    # Install from https://ollama.ai/
    # Or via Homebrew on macOS:
    brew install ollama
    
    # Verify installation
    ollama --version
  3. System Dependencies (for voice mode - optional)

    # macOS
    brew install portaudio ffmpeg
    
    # Ubuntu/Debian
    sudo apt-get install portaudio19-dev ffmpeg

Installation Steps

1. Clone Repository & Create Environment

# Clone the repository
git clone <repository-url>
cd movie-recommender

# Create virtual environment
python -m venv .venv

# Activate virtual environment
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

2. Install Python Dependencies

# Install all required packages
pip install -r requirements.txt

3. Install Ollama Models

# Pull the recommended models (one-time setup)

# This one can be used for every agent
ollama pull gpt-oss:20b

# (Optional) It can be used for the RequesterAgent to speed up things a bit
ollama pull llama3.2:3b

# Used as the embedding model for semantic search
ollama pull embeddinggemma

# Verify models are installed
ollama list

4. Configure Environment (Optional)

# Copy example configuration and edit to customize models and temperatures per agent.
cp .env.example .env

Default Configuration (already optimized):

# Models
EXTRACTION_MODEL=gpt-oss:20b
REQUESTER_MODEL=llama3.2:3b
RECOMMENDER_MODEL=gpt-oss:20b
SUMMARIZER_MODEL=gpt-oss:20b
EMBEDDING_MODEL=embeddinggemma

# Temperatures
EXTRACTION_TEMPERATURE=0.0
REQUESTER_TEMPERATURE=0.5
RECOMMENDER_TEMPERATURE=0.5
SUMMARIZER_TEMPERATURE=0.5

5. Index the Movie Database

# Index the dataset (1,000 movies from data/movies.csv)
python -m scripts.index_dataset

# Expected output:
# Processing movies... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:01:23
# βœ… Successfully indexed 1000 movies to ChromaDB collection 'movies'

What this does:

  • Generates embeddings for each movie's overview
  • Stores vectors + metadata in ChromaDB (./chroma_db/)
  • Takes around a minute

Pro Tip: You can interrupt with Ctrl+C to test with a smaller subset first.


Start a conversation

Run a quick test to ensure everything is working:

python -m scripts.run

Voice Mode

python -m scripts.run --voice
  • 🎀 Speak your responses instead of typing
  • πŸ”Š Hear the agent's messages
  • First run: Whisper downloads ~140MB model automatically

Verbose Logging

python -m scripts.run --verbose

πŸ§ͺ Testing

Run All Tests

pytest tests

Conversation Transcripts

At the end of each conversation, a conversation transcript is stored along with a summary to the ./conversations folder:

./conversations/conversation_YYYYMMDD_HHMMSS.json

A few examples are included in the ./conversations/examples folder:


🎯 Key Design Decisions

  • One agent per task: Modular agent architecture allows fine-tuning models, temperatures and prompts for every specific use case.
  • Protocol-based design: All agents implement a common Agent protocol with standardized AgentResponse, enabling easy swapping and testing.
  • Model selection strategy: Different models for different tasks (gpt-oss for accuracy, llama3.2 for speed), optimized through experimentation.
  • Structured outputs with Pydantic: Using Instructor library with Pydantic models ensures type-safe, validated extractions.
  • Voice-aware prompts: Dynamically adjusted formatting based on output mode (text vs. speech) for better UX.
  • Sentiment detection: Detects user frustration to skip additional questions and provide immediate recommendations.
  • Flexible extraction logic: Accepts either genres OR preference descriptions, lowering the barrier for users to get results.
  • Hybrid search: Combines semantic similarity (embeddings) with metadata filtering (genre tags) for more accurate retrieval.
  • Embedding model choice: Using embeddinggemma as from investigations seems to be a great open source model for embeddings and provided good results.
  • Graceful degradation: Fallback messages for every agent ensure a message is always sent to the user no matter what happens.
  • Multi-turn conversation flow: Turns are configurable, 20 by default.
  • JSON persistence: Conversations and their summary are being stored as JSON at the end for simplicity.
  • Edge case handling: Manages empty inputs, off-topic responses, and varying levels of user detail.

⏩ Potential Improvements

Testing & Quality Assurance

  • Enhanced test coverage: More comprehensive e2e flow testing and edge case scenarios.
  • LLM evaluation framework: Implement automated evaluation of agent responses for quality and accuracy.
  • Performance benchmarking: Track response times, accuracy metrics, and user satisfaction scores.
  • Integration tests with real LLMs: Currently mocked in tests; real model integration tests would catch prompt regressions.

Model & Prompt Optimization

  • Model experimentation: Only tried 3 models; could explore more specialized models for each task.
  • Prompt versioning: Track prompt changes and A/B test different phrasings for better results.
  • Temperature tuning: More granular temperature optimization per use case.

RAG & Retrieval Enhancements

  • Adding more filters: Rating could be used to weight results.
  • User feedback loop: Learn from user reactions to improve future recommendations.
  • Improve default search query: Right now it just returns any movies, we could create a default query based on rating.
  • Caching: Store frequent queries and embeddings to reduce latency and API calls.

Scalability & Production Readiness

  • Asynchronous processing: Make summarization and non-critical operations async to improve response times.
  • Real-time conversation updates: Stream conversation state to storage after each turn, not just at the end.
  • Rate limiting & quotas: Protect against abuse and manage token usage per user/session.
  • Containerization: Docker setup for consistent deployment across environments.

Security & Guardrails

  • Input sanitization: More robust validation and sanitization of user inputs.
  • Content filtering: Detect and handle inappropriate or off-topic requests more strictly.
  • PII detection: Identify and redact personally identifiable information.

UX & Accessibility

  • Streaming responses: Stream LLM responses token-by-token for better perceived performance.
  • Progress indicators: Show when the system is thinking/searching for better transparency.
  • Multi-language support: Extend beyond English.

About

Conversational AI agents that collect user preferences and provide personalized movie recommendations using RAG

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages