A high-performance, distributed web scraping service built with:
- Python
- FastAPI
- Playwright
- Upstash Redis
- Fly.io Deployment
- Concurrent scraping of multiple URLs
- Distributed job queueing
- Scalable architecture
- Real-time job status tracking
- Python 3.11+
- Upstash Redis Account
- Fly.io Account
- Clone the repository
- Install dependencies:
pip install -r requirements.txt- Set Upstash Redis Environment Variables:
export UPSTASH_REDIS_REST_URL=your_redis_url
export UPSTASH_REDIS_REST_TOKEN=your_redis_token- Run API Server:
uvicorn api:app --reload- Run Worker:
python worker.pyfly launchREDIS QUEUE operation/{operation_id}: Check job status per browser per machine
Adjust concurrency and worker settings in worker.py