Apache Hive Metastore as a Standalone server in Docker
-
Updated
Aug 22, 2024 - Python
Apache Hive Metastore as a Standalone server in Docker
End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore, Minio, Postgres)
Sample code with integration between Data Catalog and Hive data source.
A Python Client for Hive Metastore
Developed a Lakehouse-based data pipeline using Sakila dataset to analyze movie sale and rental trends. The project was designed according to Delta architecture
Foundation Workspace for Airflow, Spark, Hive, and Azure Data Lake Gen2 via Docker
🧬 Genomic Data Storage Architecture: A proof of concept for securely managing and auditing massive genomic datasets by combining distributed storage 📂, event-driven microservices ⚡, and blockchain ⛓️ (or equivalent notarization) for tamper-proof, traceable, and scalable genomic data workflows.
Data Lakehouse local stack with PySpark, Trino, and Minio. Includes an example to process Raygun error data and the IP address occurrence.
Batch processing pipeline for BI and AI purposes
Add a description, image, and links to the hive-metastore topic page so that developers can more easily learn about it.
To associate your repository with the hive-metastore topic, visit your repo's landing page and select "manage topics."