Releases · aws-samples/spark-on-aws-lambda

16 Jul 13:05

v0.4.0

abfcb58

Release 0.4.0 Latest

Latest

Summary
Refactored Spark Lambda Dockerfile to use multi-stage builds for optimal container size and added Ubuntu variant with comprehensive documentation.

Key Changes
🚀 Performance & Optimization:

Implemented multi-stage Docker build reducing final image size

Consolidated RUN commands to minimize Docker layers

Added --no-cache-dir flags for pip installations

Improved cleanup procedures removing temporary files and caches

⬆️ Runtime Modernization:

Upgraded Python runtime from 3.8 → 3.10

Upgraded Java from OpenJDK 1.8 → Amazon Corretto 11

Updated environment paths to reflect Python 3.10 structure

Enhanced security with proper version locking removal

🐧 Platform Extension:

Added Dockerfile.ubuntu for Ubuntu 22.04 deployment

Created generic Spark runner with S3 integration

Implemented non-root user execution for improved security

Added comprehensive documentation in UBUNTU_DOCKERFILE_GUIDE.md

🛠️ Code Quality:

Removed commented legacy code for DEEQU installation

Improved conditional framework installation logic

Better error handling and logging in build process

Standardized environment variable organization

📋 Framework Support:

Maintained compatibility with Delta, Hudi, Iceberg, and Deequ frameworks

Preserved all existing build arguments and configurations

Enhanced JAR download process with better error handling

Benefits
Reduced image size through multi-stage builds

Improved security with latest runtime versions and non-root execution

Better maintainability with cleaner, more organized code

Extended deployment options supporting both Lambda and Ubuntu environments

Enhanced developer experience with comprehensive documentation

Breaking Changes
Python runtime path changed from /var/lang/lib/python3.8/ to /var/lang/lib/python3.10/

Java runtime upgraded may require application compatibility testing

Assets 2

27 Feb 19:48

JohnChe88

v0.3.0

64251f0

v0.3.0

Releasing SoAL v0.3.0

Added integration with AWS Glue catalog
Added the connectors to Snowflake and Amazon Redshift
Added an option to split large files into smaller 128 MB chunks
Added sample script to show Deequ integration for data quality check
Added the library to read large file for micro batch ingestion

Assets 2

28 Feb 20:27

JohnChe88

v0.2.0

ecb27f3

v0.2.0 Pre-release

Pre-release

Release v0.2.0 introduces several new features and improvements, including:

Architecture to submit the PySpark script from Amazon S3 on AWS Lambda using Spark on Docker. This feature enables users to easily run PySpark jobs on AWS Lambda and impact less when pyspark code requires update.
SAM (Serverless Application Model) templates to automatically build and deploy Docker images to AWS ECR (Elastic Container Registry) and AWS Lambda. This feature makes it easy to deploy and manage Docker images on AWS Lambda using SAM templates.
Apache Hudi integration with Spark on AWS Lambda. This feature enables users to use Apache Hudi, a storage system for managing small to medium (up to 200MB payload) and complex data sets on Amazon S3.

These features enhance the usability and scalability of Spark on AWS Lambda, providing users with more flexibility and options for running PySpark jobs on AWS Lambda.

Assets 2

23 Feb 19:44

JohnChe88

v0.1.0

a4852aa

v0.1.0: Update README.md Pre-release

Pre-release

Spark on AWS Lambda is a standalone installation of Spark that runs on AWS Lambda using a Docker container. It provides a cost-effective solution for event-driven pipelines with smaller files, where heavier engines like Amazon EMR or AWS Glue incur overhead costs and operate more slowly.

Release 0.1.0 Features:

Dockerfile that has Pyspark and dependencies installed.
Sample script to read and write csv file on Amazon S3
Authentication and authorization framework for connecting to Amazon S3

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: aws-samples/spark-on-aws-lambda

Release 0.4.0

Uh oh!

v0.3.0

Uh oh!

v0.2.0

Uh oh!

v0.1.0: Update README.md

Uh oh!