Skip to content

Simple minimal RAG LLMOps architecture following best practices

mjdocevedo/ragops_cours

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RAGOPS - Production-Ready RAG Pipeline

Docker FastAPI Meilisearch LiteLLM

A production-ready Retrieval-Augmented Generation (RAG) pipeline built with modern technologies, designed for CPU deployment with enterprise-grade features.

πŸ—οΈ Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                           RAGOPS ARCHITECTURE                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Client    │────│   Nginx     │────│  FastAPI    │────│ Meilisearch β”‚
β”‚ Application β”‚    β”‚  (Optional) β”‚    β”‚  Backend    β”‚    β”‚   Search    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                              β”‚                    β”‚
                                              β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                              β”‚              β”‚ Document β”‚
                                              β”‚              β”‚ & Chunks β”‚
                                              β”‚              β”‚ Indexes  β”‚
                                              β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                              β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Redis     │──────────┼──────────│  LiteLLM    β”‚
                    β”‚  Caching    β”‚          β”‚          β”‚   Proxy     β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                              β”‚                    β”‚
                                              β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                              β”‚              β”‚   Groq   β”‚
                                              └──────────────│   LLM    β”‚
                                                            β”‚ Provider β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚     TEI     β”‚                               β”‚
                    β”‚ Embeddings  β”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚  Service    β”‚          
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          

Flow:
1. Documents β†’ Ingestion β†’ Chunking β†’ Embeddings β†’ Meilisearch
2. Query β†’ FastAPI β†’ Meilisearch (Hybrid Search) β†’ Context β†’ LLM β†’ Response
3. Redis caches embeddings and responses for performance

πŸš€ Key Features

Core Capabilities

  • πŸ” Hybrid Search: Combines vector similarity and BM25 text search
  • πŸ“„ Document Processing: Supports multiple document types with intelligent chunking
  • 🧠 LLM Integration: Groq LLMs via LiteLLM proxy with fallback support
  • ⚑ High Performance: Redis caching with 5-50x speed improvements
  • 🎯 Semantic Retrieval: TEI embeddings for semantic understanding
  • πŸ”§ Production Ready: Docker Compose orchestration with health checks

Technical Features

  • CPU Optimized: Runs efficiently on CPU-only infrastructure
  • Scalable Architecture: Microservices design with independent scaling
  • Enterprise Security: Authentication, authorization, and secure communication
  • Monitoring & Logging: Comprehensive observability stack
  • API Documentation: Auto-generated OpenAPI/Swagger documentation

πŸ“‹ Prerequisites

  • Docker & Docker Compose: Latest versions
  • 4GB+ RAM: Recommended for optimal performance
  • API Keys: Groq API key for LLM access
  • Storage: 2GB+ free disk space for models and indexes

⚑ Quick Start

1. Clone and Configure

git clone <repository-url>
cd RAGOPS

# Copy and configure environment
cp .env.example .env
# Edit .env with your API keys (see Configuration section)

2. Start the Stack

# Start all services
docker-compose up -d

# Check service health
docker-compose ps

# View logs
docker-compose logs -f

3. Verify health

# Check API health
curl http://localhost:18000/health

πŸ”§ Configuration

Environment Variables (.env)

# Meilisearch Configuration
MEILI_KEY=your_secure_master_key_here
MEILI_INDEX=documents
EMBED_DIM=384

# LLM Provider Configuration  
LITELLM_KEY=your_proxy_key_here
GROQ_API_KEY=your_groq_api_key_here

# Optional: Additional LLM providers
OPENAI_API_KEY=your_openai_key_here
HUGGINGFACE_API_KEY=your_hf_key_here

# Service URLs (Docker internal)
MEILI_URL=http://meilisearch:7700
PROXY_URL=http://litellm:4000
REDIS_URL=redis://redis:6379

LiteLLM Model Configuration

Edit litellm/config.yaml to customize:

model_list:
  # Primary chat/completions model on Groq
  - model_name: groq-llama3
    litellm_params:
      model: groq/llama-3.1-8b-instant
      api_key: os.environ/GROQ_API_KEY

  # Local embeddings served by TEI (OpenAI-compatible embeddings API)
  - model_name: local-embeddings
    litellm_params:
      model: openai/text-embedding-ada-002 
      api_key: os.environ/GROQ_API_KEY
      api_base: "http://tei-embeddings:80"
      custom_llm_provider: openai
      timeout: 60

# Global LiteLLM settings
litellm_settings:
  cache: true
  cache_params:
    type: "redis"
    url: "redis://redis:6379"
    ttl: 1800
    supported_call_types: ["completion", "chat_completion", "embedding", "acompletion", "aembedding"]

# Prompt Injection basic guards
prompt_injection_params:
  heuristics_check: true
  similarity_check: false
  vector_db_check: false

# Routing / fallbacks
router_settings:
  fallbacks:
    - "groq-llama3": []

πŸ“Š Service Architecture

Core Services

Service Port Description Health Check
FastAPI Backend 18000 Main API server GET /health
Meilisearch 7700 Search & vector database GET /health
LiteLLM Proxy 4000 LLM routing proxy GET /health
TEI Embeddings 80 Text embeddings service GET /health
Redis 6379 Caching layer TCP check

Data Flow

  1. Document Ingestion:

    Documents β†’ FastAPI β†’ Processing β†’ Embeddings (TEI) β†’ Meilisearch
    
  2. Query Processing:

    Query β†’ FastAPI β†’ Embeddings (TEI) β†’ Search (Meilisearch) β†’ Context β†’ LLM (Groq) β†’ Response
    
  3. Caching Layer:

    Redis caches: Embeddings (1h TTL) | LLM Responses (10min TTL)
    

πŸ§ͺ Testing & Validation

Health Monitoring

# Check all services
docker-compose ps

# View service logs
docker-compose logs [service-name]

# Monitor resource usage
docker stats

# Test individual components
curl http://localhost:7700/health    # Meilisearch
curl http://localhost:18000/health   # FastAPI Backend

πŸ”§ Development & Customization

Project Structure

.
β”œβ”€β”€ Makefile
β”œβ”€β”€ README.md
β”œβ”€β”€ backend
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ app
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ api
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ chat.py
β”‚   β”‚   β”‚   β”œβ”€β”€ embeddings.py
β”‚   β”‚   β”‚   β”œβ”€β”€ health.py
β”‚   β”‚   β”‚   β”œβ”€β”€ ingest.py
β”‚   β”‚   β”‚   β”œβ”€β”€ pdf.py
β”‚   β”‚   β”‚   β”œβ”€β”€ search.py
β”‚   β”‚   β”‚   └── stats.py
β”‚   β”‚   β”œβ”€β”€ core
β”‚   β”‚   β”‚   β”œβ”€β”€ clients.py
β”‚   β”‚   β”‚   β”œβ”€β”€ config.py
β”‚   β”‚   β”‚   └── logging.py
β”‚   β”‚   β”œβ”€β”€ main.py
β”‚   β”‚   β”œβ”€β”€ models
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ chat.py
β”‚   β”‚   β”‚   β”œβ”€β”€ documents.py
β”‚   β”‚   β”‚   β”œβ”€β”€ health.py
β”‚   β”‚   β”‚   β”œβ”€β”€ responses.py
β”‚   β”‚   β”‚   └── search.py
β”‚   β”‚   β”œβ”€β”€ services
β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”‚   β”œβ”€β”€ chunking.py
β”‚   β”‚   β”‚   β”œβ”€β”€ embeddings.py
β”‚   β”‚   β”‚   β”œβ”€β”€ ingestion.py
β”‚   β”‚   β”‚   β”œβ”€β”€ llm_service.py
β”‚   β”‚   β”‚   β”œβ”€β”€ pdf_processor.py
β”‚   β”‚   β”‚   β”œβ”€β”€ rag_service.py
β”‚   β”‚   β”‚   └── search_service.py
β”‚   β”‚   └── utils
β”‚   β”‚       β”œβ”€β”€ __init__.py
β”‚   β”‚       β”œβ”€β”€ cache.py
β”‚   β”‚       └── hashing.py
β”‚   β”œβ”€β”€ requirements.txt
β”‚   └── seed_data.py
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ litellm
β”‚   └── config.yaml
β”œβ”€β”€ pdf_files
β”‚   β”œβ”€β”€ autoencoders.pdf
β”‚   β”œβ”€β”€ linear_algebra.pdf
β”‚   └── linear_factor_models.pdf
β”œβ”€β”€ scripts
β”‚   └── meili-init.sh
└── tests
    β”œβ”€β”€ chunking_validation.py
    β”œβ”€β”€ debug_vector_search.py
    β”œβ”€β”€ demo_phase2.py
    β”œβ”€β”€ demo_working_features.py
    β”œβ”€β”€ final_rag_test_report.py
    β”œβ”€β”€ test_all_features.py
    β”œβ”€β”€ test_direct_ingest.py
    └── test_phase2_comprehensive.py

πŸ“ˆ Production Deployment

Scaling Considerations

  1. Horizontal Scaling:

    # In docker-compose.yml
    backend:
      deploy:
        replicas: 3
    
    redis:
      deploy:
        replicas: 1  # Redis should remain single instance
  2. Resource Allocation:

    services:
      backend:
        deploy:
          resources:
            limits:
              memory: 2G
              cpus: '1.0'
  3. Data Persistence:

    volumes:
      meili_data:
        driver: local
        driver_opts:
          type: none
          o: bind
          device: /data/meilisearch

Next Evolution: Advanced Document Processing and Search Enhancement

🎯 Overview

Phase 3 will extend RAGOPS with advanced document processing capabilities and search result reranking to create a comprehensive enterprise-grade RAG system.

Current Status

  • βœ… Phase 1: Text-based chunking and ingestion
  • βœ… Phase 2: Embeddings integration with semantic search
  • 🎯 Phase 3: PDF processing + reranking (this document)

πŸ—οΈ Architecture Enhancement

New Components

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   PDF Upload    │───▢│   LangChain  │───▢│   Existing     β”‚
β”‚   Interface     β”‚    β”‚   Processor  β”‚    β”‚   Pipeline     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                       β”‚  Cross-      β”‚
                       β”‚  Encoder     β”‚
                       β”‚  Reranker    β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Enhanced Data Flow

  1. PDF Upload β†’ LangChain PyPDFLoader β†’ Page extraction
  2. Text Processing β†’ RecursiveCharacterTextSplitter β†’ Smart chunking
  3. Metadata Enrichment β†’ Page numbers, file info, structure
  4. Existing Pipeline β†’ Embeddings β†’ Meilisearch storage
  5. Enhanced Search β†’ Initial retrieval β†’ Cross-encoder reranking β†’ Final results

🎯 Quick Start

1. Start the System

make up

This builds images, starts all services, and waits for readiness

2. Validate Installation

make test

Runs comprehensive Phase 2 validation suite

3. Try a Demo

make demo

Interactive demonstration of key features

4. Check All Features

make validate

Complete system validation and feature testing


πŸ› οΈ Available Commands

Run make help to see all available commands:

make help

Core Operations

  • make up - Start all RAGOPS services
  • make down - Stop all services
  • make restart - Restart all services
  • make logs - Show backend service logs
  • make clean - Clean up Docker resources

Testing & Validation

  • make test - Run Phase 2 comprehensive tests
  • make demo - Run feature demonstrations
  • make validate - Validate all system features

Development

  • make dev-logs - Follow all service logs
  • make dev-rebuild - Rebuild and restart backend only
  • make dev-reset - Complete system reset with fresh data

πŸ—οΈ System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend      │───▢│   Backend    │───▢│  Meilisearch   β”‚
β”‚   (Future)      β”‚    β”‚   FastAPI    β”‚    β”‚   + Vector     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚                       β”‚
                              β–Ό                       β–Ό
                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                       β”‚   LiteLLM    β”‚    β”‚     Redis      β”‚
                       β”‚   Proxy      β”‚    β”‚    Cache       β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                       β”‚     TEI      β”‚    β”‚     Groq       β”‚
                       β”‚ Embeddings   β”‚    β”‚     LLM        β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Services Overview

  • Backend (Port 18000): FastAPI with Phase 2 embeddings
  • Meilisearch (Port 7700): Vector-enabled search engine
  • TEI-Embeddings (Port 8080): Text embeddings inference
  • LiteLLM (Port 4000): Multi-provider LLM proxy
  • Redis (Port 6379): Embedding cache layer
  • Meili-Init: Automated index configuration

πŸ“‘ API Reference

Core Endpoints

Health Check

curl -s http://localhost:18000/health | jq .

Response:

{
  "status": "healthy",
  "embeddings_available": true,
  "embedding_dimensions": 384
}

Document Ingestion

curl -X POST "http://localhost:18000/ingest" \
  -H "Content-Type: application/json" \
  -d '[{
    "id": "doc1",
    "text": "Your document content here...",
    "metadata": {"title": "Document Title", "category": "docs"}
  }]' | jq .

Semantic Search & RAG

curl -X POST "http://localhost:18000/search" \
  -H "Content-Type: application/json" \
  -d '{"query": "What are vector embeddings?", "k": 5}' | jq .

Test Embeddings

curl -X POST "http://localhost:18000/test-embeddings" \
  -H "Content-Type: application/json" \
  -d '["text to embed", "another text"]' | jq .

Initialize Indexes

curl -X POST "http://localhost:18000/init-index"

πŸ”§ Configuration

Environment Setup

Create a .env file with your configuration:

# Required: Groq API Key
GROQ_API_KEY=your_groq_api_key_here

# Meilisearch Configuration  
MEILI_KEY=change_me_master_key

# Performance Settings (optional)
REDIS_CACHE_TTL=3600
MAX_CHUNK_SIZE=500
CHUNK_OVERLAP=50

Service Configuration

All service URLs are automatically configured for Docker Compose:

  • PROXY_URL=http://litellm:4000
  • MEILISEARCH_URL=http://meilisearch:7700
  • REDIS_URL=redis://redis:6379
  • TEI_URL=http://tei-embeddings:8080

πŸ§ͺ Testing & Validation

Using Makefile Commands

Quick Validation

make test

Runs comprehensive Phase 2 test suite

Full System Validation

make validate  

Tests all features and generates detailed reports

Test Files Organization

All tests are in the tests/ directory:

  • tests/test_phase2_comprehensive.py - Complete Phase 2 validation
  • tests/test_all_features.py - Comprehensive feature testing
  • tests/chunking_validation.py - Text chunking validation

Check System Status

# After make up, check services
docker compose ps

# Check resource usage
docker stats

Maintenance Tasks

Clean Docker Resources

make clean

Removes containers, volumes, and prunes system

Manual Cache Clear

docker compose exec redis redis-cli FLUSHALL

Backup Data

# Backup documents
curl -H "Authorization: Bearer $MEILI_KEY" \
  "http://localhost:7700/indexes/documents/documents" > backup_documents.json

# Backup chunks  
curl -H "Authorization: Bearer $MEILI_KEY" \
  "http://localhost:7700/indexes/chunks/documents" > backup_chunks.json

About

Simple minimal RAG LLMOps architecture following best practices

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 94.2%
  • Makefile 3.0%
  • Shell 2.3%
  • Dockerfile 0.5%