This repository contains the backend source code and infrastructure definitions for the CUE (Cloud Upload Environment) application.
CUE provides a cloud-native solution for NASA DAACs (Distributed Active Archive Centers) to replace on-premises file upload capabilities for non-ICD (Interface Control Document) compliant providers. It aims to prevent compromised files from entering the DAAC environment by scanning them for malicious content.
The backend consists of:
- A FastAPI application providing RESTful APIs for the CUE dashboard.
- Integration with AWS Cognito for user authentication (SRP).
- A PostgreSQL database for storing application data.
- Event-driven Lambda functions for background processing (e.g., logging).
- Infrastructure managed via Terraform.
- Containerization using Docker for local development and deployment.
Before you begin, ensure you have the following installed:
- Docker & Docker Compose
- Poetry
- AWS CLI
- Terraform CLI (optional)
git clone https://github.com/ghrcdaac/CUE-Backend.git
cd CUE-Backend
cp .env.example .env
Edit the .env
file with your preferred local settings. Key variables to set include:
PG_USER
,PG_PASS
,PG_DB
: Credentials for PostgreSQL.POOL_ID
,CLIENT_ID
,CLIENT_SECRET
: Cognito valuesAWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
,AWS_REGION
:PGADMIN_DEFAULT_EMAIL
,PGADMIN_DEFAULT_PASSWORD
: Credentials for pgAdmin.- Ensure
ENV=dev
andDEBUG=true
.
(Note: Test database variables PG_DB_TEST
, PG_USER_TEST
, etc., are only needed for local integration tests.)
docker compose up --build -d
--build
ensures updated images.-d
runs containers in detached mode.
The PostgreSQL container will auto-run SQL scripts from src/postgres/
to initialize the schema.
- API: http://localhost:8000
- Swagger UI: http://localhost:8000/v1/docs
- pgAdmin: http://localhost:8001
pgAdmin connection details:
- Host:
postgres
- Port:
5432
- Database: value of
PG_DB
- Username: value of
PG_USER
- Password: value of
PG_PASS
docker compose down # Stops containers
docker compose down -v # Also removes volumes and data
bash scripts/build.sh
This script:
- Creates a temporary directory.
- Installs dependencies from
src/python/event_lambdas/requirements.txt
. - Copies shared utilities from
src/python/lambda_utils
. - Creates
my_deployment_package.zip
. - Packages each Lambda (e.g.,
infected-logger
) intoartifacts/<lambda_name>-lambda.zip
.
aws ecr get-login-password --region <your-aws-region> | docker login --username AWS --password-stdin <your-ecr-url>
# For ARM64 (Lambda)
docker buildx build --platform linux/arm64 -t <your-ecr-repo-name>:<your-tag> -f Dockerfile.aws .
# For AMD64
# docker build -t <your-ecr-repo-name>:<your-tag> -f Dockerfile.aws .
docker tag <your-ecr-repo-name>:<your-tag> <your-ecr-url>/<your-ecr-repo-name>:<your-tag>
docker push <your-ecr-url>/<your-ecr-repo-name>:<your-tag>
Used in CI/CD pipelines (e.g., Bamboo). Do not run locally unless deploying.
Process:
- Calls
bash ./scripts/build.sh
to prep packages. - Sets environment variables (
TF_VAR_*
, AWS keys, state bucket, etc.). - Enters
terraform/
and runs:
terraform init
terraform apply -auto-approve
This applies updates to Lambdas, API Gateway, RDS, IAM, etc.
The following AWS resources are typically managed manually to prevent accidental changes:
- VPC, Subnets, Security Groups
- RDS instances/clusters
- S3 buckets (esp. for Terraform state)
- Cognito User Pools
- Core API Gateway
Terraform scripts refer to these using existing IDs/ARNs passed as variables (TF_VAR_*
).
In order to preserve the database performance file metric data are aged-off into a S3 bucket using an scheduled AWS Glue Job. The glue job finds file metric data in the database that is outside of a configurable retention period, writes the data to partitioned parquet files, uploads the files to a s3 bucket, then removes the archived data from the database.
Configuration for the glue job is done with the following:
- The retention period is fetched from a SSM parameter.
- The execution schedule is configured by a glue job trigger.
The glue job script is in src/python/glue_jobs/
and infrastructure is contained within the module terraform/glue
.
Glue job runs are logged into Cloudwatch.