Overview
The STL Data API project is a centralized, user-friendly platform designed to serve as a proxy for accessing and interacting with public data from various regional and municipal sources, with a focus on the St. Louis region. The project addresses challenges such as inconsistent data formats, lack of standardization, and repetitive efforts in compiling datasets by providing a RESTful API and a foundation for a future web portal. It is built using a CQRS (Command Query Responsibility Segregation) architecture with microservices, leveraging modern technologies for scalability and maintainability.
Technical Overview
Project Flow
- Data Ingestion: Fetch raw data (Excel, PDF, JSON, or web content) from public St. Louis data sources.
- Raw Data Processing: Clean and transform raw data in memory, then send to Kafka for queuing.
- Data Storage: Consume processed data from Kafka, store in PostgreSQL (snapshots, historic puts, aggregations), and delete raw data from memory.
- Event Processing: Optimize short-term reads via event processors in the query-side microservice.
- API Access: Expose RESTful endpoints (via Flask) for querying data, with Open API documentation.
- Future Features: Add user subscriptions, web portal, and advanced optimizations.
Archetecture overview diagram here: docs/architecture_overview.png
Tech Stack
- Python 3.10+: Core language for data processing and API development.
- Flask Restful: Framework for building RESTful APIs in CQRS microservices.
- Kafka: Message broker for scalable, write-optimized data queuing (containerized).
- PostgreSQL: Database for storing processed data (containerized).
- Docker: Containerization for Kafka, PostgreSQL, and microservices.
- Open API (Swagger): API documentation for endpoints.
- SQLAlchemy: ORM for PostgreSQL interactions.
Getting Started
Prerequisites
- Python 3.10+: Install via python.org or pyenv.
- Docker Desktop: Install from docker.com (includes Docker Compose).
- psql Client: For PostgreSQL interaction (e.g., brew install postgresql on Mac).
- Git: For cloning the repository.
- VS Code: Recommended IDE with extensions (Python, Docker).
Detailed setup is in setup.md. Summary:
- Clone the repo:
git clone https://github.com/oss-slu/stl_metro_dat_api && cd stl_metro_dat_api. - Create and activate a virtual environment:
python -m venv venv && source venv/bin/activate(Windows:venv\Scripts\activate). - Install dependencies:
pip install -r requirements.txt. - Copy
.env.exampleto.env:cp .env.example .envand update variables (e.g.,PG_PASSWORD). - Register a new server in PostgreSQL pgAdmin 4 with port number
5433 - Start Kafka and PostgreSQL:
docker-compose -f docker/docker-compose.yml up -d. - Verify setup: Run
python tests/basic_test.pyto confirm Kafka/PG connectivity.
stl_metro_dat_api/
├── src/ # Python source code
│ ├── write_service/ # CQRS Command side (data ingestion/processing)
│ └── read_service/ # CQRS Query side (event processors/API)
├── docker/ # Dockerfiles and Docker Compose configs
├── config/ # Kafka/PostgreSQL configurations
├── tests/ # Unit and integration tests
├── docs/ # Open API (Swagger) specifications
├── requirements.txt # Python dependencies
├── .env.example # Template for environment variables
├── setup.md # Detailed setup guide
└── README.md # This file
- Start containers:
docker-compose --env-file.env -f docker/docker-compose.yml up -d.- The write-service app should start automatically with Docker. To run the write-side app without Docker, go to the project's root directory in your terminal, and run
python -m src.write_service.app.
- The write-service app should start automatically with Docker. To run the write-side app without Docker, go to the project's root directory in your terminal, and run
- Run read-side microservice:
cd src/read_service && python app.py. - You can view the write-service app by going to
http://localhost:5000/in your web browser. - View Open API docs: Access Swagger UI at
http://localhost:5001/swagger. - To run the secondary front-end (excellence project):
- Go to the
frontendfolder in your terminal. - Run
python -m http.server 9000 - Go to
http://localhost:9000in your web browser.
- Go to the
Important! If you make changes to your code, you must update your Docker Containers so Docker can get the newest version of your code. To do this, run: docker-compose -f docker/docker-compose.yml build
Thank you so much for wanting to contribute to our project! Please see the Contribution Guide on how you can help.
- Run unit tests:
pytest tests/. - Check connectivity:
python tests/basic_test.py.
The STL Metro Data API uses CQRS architecture to separate data collection (write operations) from data serving (read operations).
Key Architectural Patterns:
- CQRS and Event Sourcing - Explains how we use Kafka to separate write and read operations
System Components:
- Write Service: Data ingestion and processing
- Kafka: Event streaming and message queue
- Read Service: PostgreSQL database and Flask API
For questions, reach out to the Tech Lead via Slack or GitHub Issues. Report bugs or suggest features in the Issues tab.