Skip to content

hasancatalgol/iceflow-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 Data Platform - Local Development Environment

Project Directory Structure

🧩 Services & Ports

Service Description Host Port → Container Port
Airflow Web UI Airflow API/Web Interface localhost:8080
Flower Celery monitoring UI for Airflow localhost:5555
PostgreSQL (Airflow) Backend DB for Airflow Internal only
PostgreSQL (DWH) Your data warehouse database localhost:5432
pgAdmin GUI for managing PostgreSQL localhost:5050
Redis Celery broker for Airflow Internal only
Spark Master Spark master node & UI localhost:7077 (RPC), localhost:8081 (UI)
Spark Worker 1 Spark executor(Inactive) Internal only
Spark Worker 2 Spark executor(Inactive) Internal only
Hive Metastore Catalog Hive Metastore localhost:8181
MinIO S3-compatible object storage localhost:9000 (API), localhost:9001 (Console)
MinIO Client (mc) Initializes MinIO bucket & policy Internal only

📁 Volumes

Volume Purpose
airflow-backend-db-volume Persists Airflow metadata DB (Postgres)
pgadmin_data Persists pgAdmin config & session state
dwh_data Persists data warehouse Postgres database

📦 Features

  • Airflow with Celery Executor and Redis as broker.
  • Spark Cluster with custom Iceberg support and REST catalog.
  • MinIO as S3-compatible storage for Iceberg tables.
  • pgAdmin for local PostgreSQL interaction.
  • Hive Metastore Catalog for easier Flink/Spark/Trino integration and meta data management.

🚀 Usage

# Start everything
docker compose up --build -d

# Tear down everything and remove volumes
docker compose down -v

About

Batch processing pipeline for BI and AI purposes

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published