A production-ready Retrieval-Augmented Generation (RAG) pipeline built with modern technologies, designed for CPU deployment with enterprise-grade features.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RAGOPS ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Client ββββββ Nginx ββββββ FastAPI ββββββ Meilisearch β
β Application β β (Optional) β β Backend β β Search β
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ
β β
β ββββββββββββ
β β Document β
β β & Chunks β
β β Indexes β
β ββββββββββββ
β
βββββββββββββββ β βββββββββββββββ
β Redis ββββββββββββΌβββββββββββ LiteLLM β
β Caching β β β Proxy β
βββββββββββββββ β βββββββββββββββ
β β
β ββββββββββββ
β β Groq β
ββββββββββββββββ LLM β
β Provider β
βββββββββββββββ ββββββββββββ
β TEI β β
β Embeddings βββββββββββββββββββββββββββββββββ
β Service β
βββββββββββββββ
Flow:
1. Documents β Ingestion β Chunking β Embeddings β Meilisearch
2. Query β FastAPI β Meilisearch (Hybrid Search) β Context β LLM β Response
3. Redis caches embeddings and responses for performance- π Hybrid Search: Combines vector similarity and BM25 text search
- π Document Processing: Supports multiple document types with intelligent chunking
- π§ LLM Integration: Groq LLMs via LiteLLM proxy with fallback support
- β‘ High Performance: Redis caching with 5-50x speed improvements
- π― Semantic Retrieval: TEI embeddings for semantic understanding
- π§ Production Ready: Docker Compose orchestration with health checks
- CPU Optimized: Runs efficiently on CPU-only infrastructure
- Scalable Architecture: Microservices design with independent scaling
- Enterprise Security: Authentication, authorization, and secure communication
- Monitoring & Logging: Comprehensive observability stack
- API Documentation: Auto-generated OpenAPI/Swagger documentation
- Docker & Docker Compose: Latest versions
- 4GB+ RAM: Recommended for optimal performance
- API Keys: Groq API key for LLM access
- Storage: 2GB+ free disk space for models and indexes
git clone <repository-url>
cd RAGOPS
# Copy and configure environment
cp .env.example .env
# Edit .env with your API keys (see Configuration section)# Start all services
docker-compose up -d
# Check service health
docker-compose ps
# View logs
docker-compose logs -f# Check API health
curl http://localhost:18000/health# Meilisearch Configuration
MEILI_KEY=your_secure_master_key_here
MEILI_INDEX=documents
EMBED_DIM=384
# LLM Provider Configuration
LITELLM_KEY=your_proxy_key_here
GROQ_API_KEY=your_groq_api_key_here
# Optional: Additional LLM providers
OPENAI_API_KEY=your_openai_key_here
HUGGINGFACE_API_KEY=your_hf_key_here
# Service URLs (Docker internal)
MEILI_URL=http://meilisearch:7700
PROXY_URL=http://litellm:4000
REDIS_URL=redis://redis:6379Edit litellm/config.yaml to customize:
model_list:
# Primary chat/completions model on Groq
- model_name: groq-llama3
litellm_params:
model: groq/llama-3.1-8b-instant
api_key: os.environ/GROQ_API_KEY
# Local embeddings served by TEI (OpenAI-compatible embeddings API)
- model_name: local-embeddings
litellm_params:
model: openai/text-embedding-ada-002
api_key: os.environ/GROQ_API_KEY
api_base: "http://tei-embeddings:80"
custom_llm_provider: openai
timeout: 60
# Global LiteLLM settings
litellm_settings:
cache: true
cache_params:
type: "redis"
url: "redis://redis:6379"
ttl: 1800
supported_call_types: ["completion", "chat_completion", "embedding", "acompletion", "aembedding"]
# Prompt Injection basic guards
prompt_injection_params:
heuristics_check: true
similarity_check: false
vector_db_check: false
# Routing / fallbacks
router_settings:
fallbacks:
- "groq-llama3": []| Service | Port | Description | Health Check |
|---|---|---|---|
| FastAPI Backend | 18000 | Main API server | GET /health |
| Meilisearch | 7700 | Search & vector database | GET /health |
| LiteLLM Proxy | 4000 | LLM routing proxy | GET /health |
| TEI Embeddings | 80 | Text embeddings service | GET /health |
| Redis | 6379 | Caching layer | TCP check |
-
Document Ingestion:
Documents β FastAPI β Processing β Embeddings (TEI) β Meilisearch -
Query Processing:
Query β FastAPI β Embeddings (TEI) β Search (Meilisearch) β Context β LLM (Groq) β Response -
Caching Layer:
Redis caches: Embeddings (1h TTL) | LLM Responses (10min TTL)
# Check all services
docker-compose ps
# View service logs
docker-compose logs [service-name]
# Monitor resource usage
docker stats
# Test individual components
curl http://localhost:7700/health # Meilisearch
curl http://localhost:18000/health # FastAPI Backend.
βββ Makefile
βββ README.md
βββ backend
β βββ Dockerfile
β βββ app
β β βββ __init__.py
β β βββ api
β β β βββ __init__.py
β β β βββ chat.py
β β β βββ embeddings.py
β β β βββ health.py
β β β βββ ingest.py
β β β βββ pdf.py
β β β βββ search.py
β β β βββ stats.py
β β βββ core
β β β βββ clients.py
β β β βββ config.py
β β β βββ logging.py
β β βββ main.py
β β βββ models
β β β βββ __init__.py
β β β βββ chat.py
β β β βββ documents.py
β β β βββ health.py
β β β βββ responses.py
β β β βββ search.py
β β βββ services
β β β βββ __init__.py
β β β βββ chunking.py
β β β βββ embeddings.py
β β β βββ ingestion.py
β β β βββ llm_service.py
β β β βββ pdf_processor.py
β β β βββ rag_service.py
β β β βββ search_service.py
β β βββ utils
β β βββ __init__.py
β β βββ cache.py
β β βββ hashing.py
β βββ requirements.txt
β βββ seed_data.py
βββ docker-compose.yml
βββ litellm
β βββ config.yaml
βββ pdf_files
β βββ autoencoders.pdf
β βββ linear_algebra.pdf
β βββ linear_factor_models.pdf
βββ scripts
β βββ meili-init.sh
βββ tests
βββ chunking_validation.py
βββ debug_vector_search.py
βββ demo_phase2.py
βββ demo_working_features.py
βββ final_rag_test_report.py
βββ test_all_features.py
βββ test_direct_ingest.py
βββ test_phase2_comprehensive.py-
Horizontal Scaling:
# In docker-compose.yml backend: deploy: replicas: 3 redis: deploy: replicas: 1 # Redis should remain single instance
-
Resource Allocation:
services: backend: deploy: resources: limits: memory: 2G cpus: '1.0'
-
Data Persistence:
volumes: meili_data: driver: local driver_opts: type: none o: bind device: /data/meilisearch
Phase 3 will extend RAGOPS with advanced document processing capabilities and search result reranking to create a comprehensive enterprise-grade RAG system.
- β Phase 1: Text-based chunking and ingestion
- β Phase 2: Embeddings integration with semantic search
- π― Phase 3: PDF processing + reranking (this document)
βββββββββββββββββββ ββββββββββββββββ ββββββββββββββββββ
β PDF Upload βββββΆβ LangChain βββββΆβ Existing β
β Interface β β Processor β β Pipeline β
βββββββββββββββββββ ββββββββββββββββ ββββββββββββββββββ
β
βΌ
ββββββββββββββββ
β Cross- β
β Encoder β
β Reranker β
ββββββββββββββββ
- PDF Upload β LangChain PyPDFLoader β Page extraction
- Text Processing β RecursiveCharacterTextSplitter β Smart chunking
- Metadata Enrichment β Page numbers, file info, structure
- Existing Pipeline β Embeddings β Meilisearch storage
- Enhanced Search β Initial retrieval β Cross-encoder reranking β Final results
make upThis builds images, starts all services, and waits for readiness
make testRuns comprehensive Phase 2 validation suite
make demoInteractive demonstration of key features
make validateComplete system validation and feature testing
Run make help to see all available commands:
make helpmake up- Start all RAGOPS servicesmake down- Stop all servicesmake restart- Restart all servicesmake logs- Show backend service logsmake clean- Clean up Docker resources
make test- Run Phase 2 comprehensive testsmake demo- Run feature demonstrationsmake validate- Validate all system features
make dev-logs- Follow all service logsmake dev-rebuild- Rebuild and restart backend onlymake dev-reset- Complete system reset with fresh data
βββββββββββββββββββ ββββββββββββββββ ββββββββββββββββββ
β Frontend βββββΆβ Backend βββββΆβ Meilisearch β
β (Future) β β FastAPI β β + Vector β
βββββββββββββββββββ ββββββββββββββββ ββββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββ ββββββββββββββββββ
β LiteLLM β β Redis β
β Proxy β β Cache β
ββββββββββββββββ ββββββββββββββββββ
β
βΌ
ββββββββββββββββ ββββββββββββββββββ
β TEI β β Groq β
β Embeddings β β LLM β
ββββββββββββββββ ββββββββββββββββββ
- Backend (Port 18000): FastAPI with Phase 2 embeddings
- Meilisearch (Port 7700): Vector-enabled search engine
- TEI-Embeddings (Port 8080): Text embeddings inference
- LiteLLM (Port 4000): Multi-provider LLM proxy
- Redis (Port 6379): Embedding cache layer
- Meili-Init: Automated index configuration
curl -s http://localhost:18000/health | jq .Response:
{
"status": "healthy",
"embeddings_available": true,
"embedding_dimensions": 384
}curl -X POST "http://localhost:18000/ingest" \
-H "Content-Type: application/json" \
-d '[{
"id": "doc1",
"text": "Your document content here...",
"metadata": {"title": "Document Title", "category": "docs"}
}]' | jq .curl -X POST "http://localhost:18000/search" \
-H "Content-Type: application/json" \
-d '{"query": "What are vector embeddings?", "k": 5}' | jq .curl -X POST "http://localhost:18000/test-embeddings" \
-H "Content-Type: application/json" \
-d '["text to embed", "another text"]' | jq .curl -X POST "http://localhost:18000/init-index"Create a .env file with your configuration:
# Required: Groq API Key
GROQ_API_KEY=your_groq_api_key_here
# Meilisearch Configuration
MEILI_KEY=change_me_master_key
# Performance Settings (optional)
REDIS_CACHE_TTL=3600
MAX_CHUNK_SIZE=500
CHUNK_OVERLAP=50All service URLs are automatically configured for Docker Compose:
PROXY_URL=http://litellm:4000MEILISEARCH_URL=http://meilisearch:7700REDIS_URL=redis://redis:6379TEI_URL=http://tei-embeddings:8080
make testRuns comprehensive Phase 2 test suite
make validate Tests all features and generates detailed reports
All tests are in the tests/ directory:
tests/test_phase2_comprehensive.py- Complete Phase 2 validationtests/test_all_features.py- Comprehensive feature testingtests/chunking_validation.py- Text chunking validation
# After make up, check services
docker compose ps
# Check resource usage
docker statsmake cleanRemoves containers, volumes, and prunes system
docker compose exec redis redis-cli FLUSHALL# Backup documents
curl -H "Authorization: Bearer $MEILI_KEY" \
"http://localhost:7700/indexes/documents/documents" > backup_documents.json
# Backup chunks
curl -H "Authorization: Bearer $MEILI_KEY" \
"http://localhost:7700/indexes/chunks/documents" > backup_chunks.json