A simple tool to test different prompts for extracting tables from PDFs using GPT-4o and measure their repeatability.
- Install dependencies:
pip install -r requirements.txt-
Set your OpenAI API key (choose one method):
Option A: Create config.py file
# Copy the template and edit it cp config.py.template config.py # Then edit config.py and replace "your-openai-api-key-here" with your actual key
Option B: Environment variable
export OPENAI_API_KEY="your-api-key-here"
python pdf_table_extractor.py "pdfs/sample.pdf" "Extract all tables from this PDF and convert them to CSV format"python pdf_table_extractor.py "pdfs/sample.pdf" "Extract all tables from this PDF and convert them to CSV format" --runs 10 --output results.jsonpython pdf_table_extractor.py "pdfs/sample.pdf" "Extract all tables from this PDF and convert them to CSV format" --runs 10 --async-modepdf_path: Path to the PDF file to processprompt: The prompt to test for table extraction--runs: Number of test runs (default: 5)--output: Output file for detailed results (JSON format)--api-key: OpenAI API key (optional if set via environment variable)--async-mode: Run tests simultaneously for faster execution
-
Simple extraction:
"Extract all tables from this PDF and convert them to CSV format" -
Detailed extraction:
"Find all tabular data in this PDF. For each table, extract the data and format it as CSV. Include headers and preserve the structure. If there are multiple tables, separate them clearly." -
Structured extraction:
"Analyze this PDF and extract any tables or structured data. Convert each table to CSV format with proper headers. Maintain the original column order and data types."
The tool will show:
- Total number of test runs
- Number of unique results
- Agreement percentage (how many runs produced the same result)
- The most common result
- All individual results
For the best experience, use the modern web interface:
python start_ui.pyThen open your browser to: http://localhost:5000
- 🎨 Beautiful, modern interface designed specifically for testing C# NonStructuredDataReader prompts
- 📝 Pre-filled forms with the exact instructions and initial message from your C# code
- ⚡ Concurrent testing for faster results
- 📊 Real-time progress and visual results
- 📈 Agreement metrics and repeatability analysis
- 📚 Test history to compare different prompt variations
- 📱 Responsive design works on desktop and mobile
app.py: Flask web applicationstart_ui.py: Easy startup script for the web UIpdf_table_extractor.py: Main script (command line)requirements.txt: Python dependenciestest_prompts.py: Example script for testing multiple promptstemplates/: Web UI templatesstatic/: Web UI assets (CSS, JS)pdfs/: Directory containing sample PDF files