A production-ready Retrieval-Augmented Generation (RAG) pipeline built with modern technologies, designed for CPU deployment with enterprise-grade features.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β                           RAGOPS ARCHITECTURE                        β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββ    βββββββββββββββ    βββββββββββββββ    βββββββββββββββ
β   Client    ββββββ   Nginx     ββββββ  FastAPI    ββββββ Meilisearch β
β Application β    β  (Optional) β    β  Backend    β    β   Search    β
βββββββββββββββ    βββββββββββββββ    βββββββββββββββ    βββββββββββββββ
                                              β                    β
                                              β              ββββββββββββ
                                              β              β Document β
                                              β              β & Chunks β
                                              β              β Indexes  β
                                              β              ββββββββββββ
                                              β
                    βββββββββββββββ          β          βββββββββββββββ
                    β   Redis     ββββββββββββΌβββββββββββ  LiteLLM    β
                    β  Caching    β          β          β   Proxy     β
                    βββββββββββββββ          β          βββββββββββββββ
                                              β                    β
                                              β              ββββββββββββ
                                              β              β   Groq   β
                                              ββββββββββββββββ   LLM    β
                                                            β Provider β
                    βββββββββββββββ                         ββββββββββββ
                    β     TEI     β                               β
                    β Embeddings  βββββββββββββββββββββββββββββββββ
                    β  Service    β          
                    βββββββββββββββ          
Flow:
1. Documents β Ingestion β Chunking β Embeddings β Meilisearch
2. Query β FastAPI β Meilisearch (Hybrid Search) β Context β LLM β Response
3. Redis caches embeddings and responses for performance- π Hybrid Search: Combines vector similarity and BM25 text search
- π Document Processing: Supports multiple document types with intelligent chunking
- π§ LLM Integration: Groq LLMs via LiteLLM proxy with fallback support
- β‘ High Performance: Redis caching with 5-50x speed improvements
- π― Semantic Retrieval: TEI embeddings for semantic understanding
- π§ Production Ready: Docker Compose orchestration with health checks
- CPU Optimized: Runs efficiently on CPU-only infrastructure
- Scalable Architecture: Microservices design with independent scaling
- Enterprise Security: Authentication, authorization, and secure communication
- Monitoring & Logging: Comprehensive observability stack
- API Documentation: Auto-generated OpenAPI/Swagger documentation
- Docker & Docker Compose: Latest versions
- 4GB+ RAM: Recommended for optimal performance
- API Keys: Groq API key for LLM access
- Storage: 2GB+ free disk space for models and indexes
git clone <repository-url>
cd RAGOPS
# Copy and configure environment
cp .env.example .env
# Edit .env with your API keys (see Configuration section)# Start all services
docker compose up -d
# Check service health
docker compose ps
# View logs
docker compose logs -f# Check API health
curl http://localhost:18000/health
# Access API documentation
open http://localhost:18000/docs
# Run system validation
docker compose exec backend python final_rag_test_report.py# Ingest sample documents for testing
docker compose exec backend python ingest.py
# Or ingest your own documents via API
curl -X POST "http://localhost:18000/ingest" \
  -H "Content-Type: application/json" \
  -d '[{"id": "doc1", "text": "Your document content", "metadata": {"source": "file.pdf"}}]'# Test search and generation
curl -X POST "http://localhost:18000/search" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is this document about?", "k": 3}'
# Test direct chat
curl -X POST "http://localhost:18000/chat" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Hello!"}]}'# Meilisearch Configuration
MEILI_KEY=your_secure_master_key_here
MEILI_INDEX=documents
EMBED_DIM=384
# LLM Provider Configuration  
LITELLM_KEY=your_proxy_key_here
GROQ_API_KEY=your_groq_api_key_here
# Optional: Additional LLM providers
OPENAI_API_KEY=your_openai_key_here
HUGGINGFACE_API_KEY=your_hf_key_here
# Service URLs (Docker internal)
MEILI_URL=http://meilisearch:7700
PROXY_URL=http://litellm:4000
REDIS_URL=redis://redis:6379Edit litellm/config.yaml to customize:
model_list:
  # Primary chat model
  - model_name: groq-llama3
    litellm_params:
      model: groq/llama3-8b-8192
      api_key: os.environ/GROQ_API_KEY
  # Local embeddings
  - model_name: local-embeddings
    litellm_params:
      model: openai/text-embedding-ada-002
      api_base: "http://tei-embeddings:80"
      api_key: "dummy-key"
# Global settings
litellm_settings:
  cache: true
  cache_params:
    type: "redis"
    url: "redis://redis:6379"
    ttl: 1800| Service | Port | Description | Health Check | 
|---|---|---|---|
| FastAPI Backend | 18000 | Main API server | GET /health | 
| Meilisearch | 7700 | Search & vector database | GET /health | 
| LiteLLM Proxy | 4000 | LLM routing proxy | GET /health | 
| TEI Embeddings | 80 | Text embeddings service | GET /health | 
| Redis | 6379 | Caching layer | TCP check | 
| Nginx | 8443 | Reverse proxy (optional) | HTTP check | 
- 
Document Ingestion: Documents β FastAPI β Processing β Embeddings (TEI) β Meilisearch
- 
Query Processing: Query β FastAPI β Embeddings (TEI) β Search (Meilisearch) β Context β LLM (Groq) β Response
- 
Caching Layer: Redis caches: Embeddings (1h TTL) | LLM Responses (10min TTL)
POST /ingest
Content-Type: application/json
[
  {
    "id": "doc-1",
    "text": "Document content here",
    "metadata": {"source": "file.pdf", "author": "John Doe"}
  }
]POST /search
Content-Type: application/json
{
  "query": "What is machine learning?",
  "k": 5
}
Response:
{
  "answer": "Machine learning is...",
  "chunks": [...],
  "total_chunks_found": 10,
  "cached": false
}POST /chat
Content-Type: application/json
{
  "messages": [
    {"role": "user", "content": "Explain quantum computing"}
  ],
  "temperature": 0.3,
  "model": "groq-llama3"
}GET /health           # API health
POST /init-index      # Initialize search indexes- Swagger UI: http://localhost:18000/docs
- ReDoc: http://localhost:18000/redoc
# Run comprehensive system validation
docker compose exec backend python final_rag_test_report.py
# Demo working features
docker compose exec backend python demo_working_features.py
# Manual ingestion test
docker compose exec backend python ingest.pyThe system has been validated with:
- API Response Time: 3-9ms average
- Cache Performance: 5-51x speedup with Redis
- Document Processing: Supports documents from 50 to 5000+ words
- Concurrent Requests: Handles multiple simultaneous queries
- Search Accuracy: Hybrid search with relevance scoring
# Check all services
docker compose ps
# View service logs
docker compose logs [service-name]
# Monitor resource usage
docker stats
# Test individual components
curl http://localhost:7700/health    # Meilisearch
curl http://localhost:18000/health   # FastAPI BackendRAGOPS/
βββ docker-compose.yml          # Service orchestration
βββ .env                       # Environment configuration
βββ backend/                   # FastAPI application
β   βββ app/
β   β   βββ main.py           # Main API application
β   βββ Dockerfile            # Backend container
β   βββ requirements.txt      # Python dependencies
β   βββ ingest.py             # Sample data ingestion
β   βββ demo_working_features.py  # Feature demonstration
β   βββ final_rag_test_report.py  # System validation
βββ litellm/
β   βββ config.yaml           # LLM proxy configuration
βββ nginx/                    # Optional reverse proxy
    βββ nginx.conf# Via API
import httpx
documents = [
    {
        "id": "custom-doc-1",
        "text": "Your document content here...",
        "metadata": {
            "title": "Document Title",
            "author": "Author Name",
            "category": "technical",
            "tags": ["ai", "machine-learning"]
        }
    }
]
async with httpx.AsyncClient() as client:
    response = await client.post(
        "http://localhost:18000/ingest",
        json=documents
    )
    print(response.json())Add new providers in litellm/config.yaml:
model_list:
  # OpenAI GPT-4
  - model_name: openai-gpt4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY
  # Anthropic Claude
  - model_name: claude-3
    litellm_params:
      model: anthropic/claude-3-sonnet
      api_key: os.environ/ANTHROPIC_API_KEY- 
Horizontal Scaling: # In docker-compose.yml backend: deploy: replicas: 3 redis: deploy: replicas: 1 # Redis should remain single instance 
- 
Resource Allocation: services: backend: deploy: resources: limits: memory: 2G cpus: '1.0' 
- 
Data Persistence: volumes: meili_data: driver: local driver_opts: type: none o: bind device: /data/meilisearch 
- 
Environment Security: # Use strong, unique keys MEILI_KEY=$(openssl rand -hex 32) LITELLM_KEY=$(openssl rand -hex 32) # Restrict network access # Configure firewall rules # Use TLS certificates 
- 
API Security: - Enable authentication in LiteLLM config
- Configure rate limiting
- Set up request validation
- Monitor API access logs
 
# Add to docker-compose.yml
services:
  prometheus:
    image: prom/prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml
  grafana:
    image: grafana/grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin- 
Service Won't Start: # Check logs docker compose logs [service-name] # Verify environment docker compose config # Restart services docker compose restart [service-name] 
- 
Search Not Working: # Check Meilisearch indexes curl -H "Authorization: Bearer $MEILI_KEY" \ http://localhost:7700/indexes # Reinitialize indexes curl -X POST http://localhost:18000/init-index 
- 
LLM Errors: # Verify API keys docker compose exec backend env | grep -E "(GROQ|OPENAI)_API_KEY" # Test LiteLLM directly docker compose logs litellm 
- 
Performance Issues: # Check resource usage docker stats # Monitor cache hit rates docker compose exec backend python demo_working_features.py # Clear Redis cache docker compose exec redis redis-cli FLUSHALL 
# Access service containers
docker compose exec backend bash
docker compose exec meilisearch sh
# Check network connectivity
docker compose exec backend ping meilisearch
docker compose exec backend ping litellm
# View detailed logs
docker compose logs -f --tail=100
# Restart problematic services
docker compose restart backend litellmThis project is licensed under the MIT License - see the LICENSE file for details.
We welcome contributions! Please see our Contributing Guidelines for details.
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request