Skip to content

Release 0.4.0

Latest

Choose a tag to compare

@JohnChe88 JohnChe88 released this 16 Jul 13:05
· 2 commits to main since this release
abfcb58

Summary
Refactored Spark Lambda Dockerfile to use multi-stage builds for optimal container size and added Ubuntu variant with comprehensive documentation.

Key Changes
🚀 Performance & Optimization:

Implemented multi-stage Docker build reducing final image size

Consolidated RUN commands to minimize Docker layers

Added --no-cache-dir flags for pip installations

Improved cleanup procedures removing temporary files and caches

⬆️ Runtime Modernization:

Upgraded Python runtime from 3.8 → 3.10

Upgraded Java from OpenJDK 1.8 → Amazon Corretto 11

Updated environment paths to reflect Python 3.10 structure

Enhanced security with proper version locking removal

🐧 Platform Extension:

Added Dockerfile.ubuntu for Ubuntu 22.04 deployment

Created generic Spark runner with S3 integration

Implemented non-root user execution for improved security

Added comprehensive documentation in UBUNTU_DOCKERFILE_GUIDE.md

🛠️ Code Quality:

Removed commented legacy code for DEEQU installation

Improved conditional framework installation logic

Better error handling and logging in build process

Standardized environment variable organization

📋 Framework Support:

Maintained compatibility with Delta, Hudi, Iceberg, and Deequ frameworks

Preserved all existing build arguments and configurations

Enhanced JAR download process with better error handling

Benefits
Reduced image size through multi-stage builds

Improved security with latest runtime versions and non-root execution

Better maintainability with cleaner, more organized code

Extended deployment options supporting both Lambda and Ubuntu environments

Enhanced developer experience with comprehensive documentation

Breaking Changes
Python runtime path changed from /var/lang/lib/python3.8/ to /var/lang/lib/python3.10/

Java runtime upgraded may require application compatibility testing