A comprehensive PDF processing system with Ollama integration for document Q&A, built with FastAPI, SQLModel, PostgreSQL, and Docker.
- PDF Upload & Processing: Upload PDF files and extract text content automatically
- Document Management: Store, retrieve, and manage PDF documents with metadata
- Semantic Search: Search through document content using intelligent text matching
- Document Q&A: Chat with your documents using Ollama language models
- Background Processing: Asynchronous PDF processing with status tracking
- REST API: Complete RESTful API with OpenAPI documentation
- Docker Support: Containerized deployment with PostgreSQL and pgAdmin
- FastAPI: Modern Python web framework for building APIs
- SQLModel: Type-safe database models with Pydantic integration
- PostgreSQL: Robust relational database for document storage
- Ollama: Local LLM integration for document Q&A
- Docker: Containerized deployment and development
- Python 3.11+
- Docker & Docker Compose
- Ollama (running locally)
First, install and start Ollama on your system:
# Install Ollama (visit https://ollama.ai for platform-specific instructions)
# Pull a model (e.g., llama2)
ollama pull llama2
# Clone the repository
git clone <your-repo-url>
cd ollama-pdf-processor
# Create uploads directory
mkdir uploads
Copy and modify the .env
file if needed:
# The default configuration should work for most setups
# Modify OLLAMA_MODEL if you want to use a different model
#Sample Config:
# Database Configuration
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/postgres
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=postgres
# pgAdmin Configuration
[email protected]
PGADMIN_DEFAULT_PASSWORD=admin123
# Application Configuration
APP_NAME=Ollama PDF Processor
APP_VERSION=1.0.0
DEBUG=True
HOST=0.0.0.0
PORT=8000
# Ollama Configuration
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.2-vision
# File Upload Configuration
MAX_FILE_SIZE_MB=50
UPLOAD_DIR=./uploads
ALLOWED_FILE_TYPES=pdf
# Start all services
docker-compose up -d
# Check service status
docker-compose ps
This will start:
- PostgreSQL on port 5432
- pgAdmin on port 5050 ([email protected] / admin123)
- FastAPI on port 8000
Visit these URLs to verify everything is working:
- API Documentation: http://localhost:8000/docs
- Health Check: http://localhost:8000/health
- pgAdmin: http://localhost:5050
📋 For detailed PDF upload instructions, see PDF_UPLOAD_GUIDE.md
Method 1: Web Interface (Easiest)
- Visit http://localhost:8000/docs
- Find
POST /documents/upload
- Click "Try it out" and choose your PDF
- Click "Execute"
Method 2: Command Line
curl -X POST "http://localhost:8000/documents/upload" \\
-H "accept: application/json" \\
-H "Content-Type: multipart/form-data" \\
-F "[email protected]"
Method 3: PowerShell (Windows)
$uri = "http://localhost:8000/documents/upload"
$filePath = "C:\\path\\to\\your\\document.pdf"
Invoke-RestMethod -Uri $uri -Method Post -Form @{ file = Get-Item -Path $filePath }
Method 4: Upload Scripts (Windows)
For convenience, we've included upload scripts:
# Simple upload
upload-pdf.bat "path\\to\\your\\document.pdf"
# Upload and wait for processing
upload-pdf.bat "path\\to\\your\\document.pdf" -wait
Or use the PowerShell script directly:
.\\upload-pdf.ps1 -PdfPath "path\\to\\your\\document.pdf" -WaitForProcessing
curl -X GET "http://localhost:8000/documents"
curl -X POST "http://localhost:8000/search" \\
-H "Content-Type: application/json" \\
-d '{
"query": "your search query",
"limit": 10
}'
curl -X POST "http://localhost:8000/chat" \\
-H "Content-Type: application/json" \\
-d '{
"message": "What is this document about?",
"context_limit": 5
}'
POST /documents/upload
- Upload a PDF fileGET /documents
- List all documentsGET /documents/{id}
- Get specific documentDELETE /documents/{id}
- Delete documentGET /documents/{id}/chunks
- Get document chunks
POST /search
- Semantic search across documentsPOST /chat
- Chat with documents using Ollama
GET /health
- Health check and system statusGET /ollama/models
- List available Ollama modelsPOST /ollama/pull/{model}
- Pull new Ollama model
- Install dependencies:
pip install -r requirements.txt
- Set up PostgreSQL locally or use Docker:
docker run --name postgres -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:15
- Run the application:
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
The application uses SQLModel with PostgreSQL. Database tables are created automatically on startup.
To access the database:
- pgAdmin: http://localhost:5050
- Direct connection: postgresql://postgres:postgres@localhost:5432/postgres
Application logs are available in the container:
docker-compose logs api
Key environment variables in .env
:
DATABASE_URL
: PostgreSQL connection stringOLLAMA_BASE_URL
: Ollama service URLOLLAMA_MODEL
: Default model for document Q&AMAX_FILE_SIZE_MB
: Maximum PDF file sizeCHUNK_SIZE
: Text chunk size for processingCHUNK_OVERLAP
: Overlap between text chunks
-
Ollama connection failed:
- Ensure Ollama is running:
ollama serve
- Check if the model is available:
ollama list
- Pull required model:
ollama pull llama2
- Ensure Ollama is running:
-
Database connection error:
- Verify PostgreSQL is running:
docker-compose ps
- Check database logs:
docker-compose logs postgres
- Verify PostgreSQL is running:
-
PDF processing fails:
- Check file size limits
- Verify PDF is not password-protected
- Check application logs:
docker-compose logs api
Use the health endpoint to diagnose issues:
curl http://localhost:8000/health
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
For issues and questions:
- Check the troubleshooting section
- Review the API documentation at
/docs
- Check application logs
- Open an issue on the repository