This repository contains the complete infrastructure-as-code (IaC) configuration for the MicroTodo application - a cloud-native microservices-based todo application deployed on AWS EKS (Elastic Kubernetes Service).
- Architecture Overview
- Technology Stack
- Infrastructure Components
- Prerequisites
- Getting Started
- Project Structure
- Deployment Guide
- Services
- Security
- Monitoring and Logging
- Development
- Production Considerations
- Troubleshooting
MicroTodo is built using a microservices architecture with the following components:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Internet β
βββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
β
βββββββββΌβββββββββ
β AWS ALB/NLB β
β (Ingress) β
βββββββββ¬βββββββββ
β
βββββββββΌβββββββββ
β API Gateway β
βββββββββ¬βββββββββ
β
βββββββββββββββββββββΌββββββββββββββββββββ
β β β
ββββββΌββββββ βββββββΌβββββββ βββββββΌβββββββ
β Users β β Tasks β βNotificationsβ
β Service β β Service β β Service β
ββββββ¬ββββββ βββββββ¬βββββββ βββββββ¬βββββββ
β β β
ββββββΌββββββ βββββββΌβββββββ βββββββΌβββββββ
βPostgreSQLβ βPostgreSQL β βPostgreSQL β
β Users β β Tasks β βNotificationsβ
ββββββββββββ ββββββββββββββ ββββββββββββββ
β
βββββββββΌβββββββββ
β RabbitMQ β
β (Message Bus) β
ββββββββββββββββββ
- Microservices Architecture: Separate services for users, tasks, and notifications
- API Gateway Pattern: Single entry point for all client requests
- Event-Driven Communication: RabbitMQ for asynchronous messaging between services
- Database per Service: Each microservice has its own PostgreSQL database
- Container Orchestration: Kubernetes (EKS) for container management
- GitOps Deployment: ArgoCD for continuous deployment
- Infrastructure as Code: Terraform for AWS resource provisioning
- Secrets Management: AWS Secrets Manager with External Secrets Operator
- Cloud Provider: AWS
- Kubernetes: Amazon EKS 1.31
- Infrastructure as Code: Terraform (~> 1.13)
- Container Registry: Amazon ECR
- Secret Management: AWS Secrets Manager
- State Management: S3 + DynamoDB for Terraform state
- Ingress Controller: AWS Load Balancer Controller (v1.13.4)
- GitOps: ArgoCD (v8.5.7)
- Secrets: External Secrets Operator (v0.20.1)
- Storage: AWS EBS CSI Driver
- Message Queue: RabbitMQ
- VPC: Custom VPC with public and private subnets across 3 AZs
- Load Balancing: Application Load Balancer (ALB)
- DNS: Kubernetes DNS (CoreDNS)
- Service Mesh: (Optional - can be added)
-
VPC & Networking
- Custom VPC (10.0.0.0/16)
- 3 Public subnets across availability zones
- 3 Private subnets across availability zones
- Internet Gateway for public subnet access
- NAT Gateway for private subnet internet access
- Route tables for public and private subnets
-
EKS Cluster
- Kubernetes version: 1.31
- Node group with t3.small instances (2-5 nodes)
- Managed node groups in private subnets
- Public and private API endpoints
- CloudWatch logging enabled
-
ECR Repositories
microtodo/api-gatewaymicrotodo/users-servicemicrotodo/tasks-servicemicrotodo/notifications-service- Lifecycle policies (7-day untagged image expiration, keep last 30 tagged images)
-
IAM Roles & Policies
- EKS cluster role
- EKS node group role
- AWS Load Balancer Controller role
- External Secrets Operator role
- EBS CSI Driver role
-
Secrets Management
- AWS Secrets Manager for sensitive data
- PostgreSQL credentials
- RabbitMQ credentials
- JWT secrets
-
Namespaces
microtodo: Main application namespaceargocd: GitOps controllerexternal-secrets-system: Secrets managementkube-system: System components
-
Services
- API Gateway (entry point)
- Users Service (authentication & user management)
- Tasks Service (task management)
- Notifications Service (notifications)
-
Databases
- PostgreSQL for Users (with persistent storage)
- PostgreSQL for Tasks (with persistent storage)
- PostgreSQL for Notifications (with persistent storage)
- RabbitMQ (message broker)
-
Ingress
- ALB-based ingress for API Gateway
- HTTP/HTTPS support
- (Optional) SSL/TLS termination
Before you begin, ensure you have the following installed:
- AWS CLI (v2.x)
- Terraform (>= 1.13)
- kubectl (>= 1.31)
- Docker (for local development)
- Docker Compose (for local development)
- ArgoCD CLI (optional, for ArgoCD management)
- AWS account with appropriate permissions
- AWS CLI configured with credentials:
aws configure
- Note your AWS Account ID (you'll need this for Terraform)
First, create the S3 bucket and DynamoDB table for Terraform state:
cd terraform-bootstrap
terraform init
terraform applyImportant: Note the outputs from this step:
terraform_state_bucket: Use this interraform/terraform.tfbackend configurationterraform_lock_table_name: Use this interraform/terraform.tfbackend configuration
Create a terraform.tfvars file in the terraform/ directory:
aws_account_id = "your-aws-account-id"
# Database credentials
postgres_username = "postgres"
postgres_password = "your-secure-password"
# RabbitMQ credentials
rabbitmq_username = "rabbit"
rabbitmq_password = "your-secure-password"
# JWT secret
jwt_secret = "your-jwt-secret-key"Security Note: Never commit terraform.tfvars to version control. It's already in .gitignore.
Update the backend configuration in terraform/terraform.tf with the bucket name from Step 1:
backend "s3" {
bucket = "microtodo-tf-state-XXXXXX" # From bootstrap output
key = "microtodo/terraform.tfstate"
region = "eu-west-2"
dynamodb_table = "microtodo-tf-state-lock"
encrypt = true
}cd terraform
terraform init
terraform plan
terraform applyThis will create:
- VPC and networking components
- EKS cluster and node groups
- ECR repositories
- IAM roles and policies
- Secrets in AWS Secrets Manager
- Install Helm charts (ALB Controller, ArgoCD, External Secrets)
Note: This process takes approximately 15-20 minutes.
After the EKS cluster is created, configure kubectl:
aws eks update-kubeconfig --region eu-west-2 --name microtodo_eks_clusterVerify the connection:
kubectl get nodes
kubectl get namespacesApply the Kubernetes manifests:
# Create namespace
kubectl apply -f k8s/namespaces/
# Deploy all resources
kubectl apply -f k8s/-
Get the ArgoCD admin password:
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo
-
Port-forward to access ArgoCD UI:
kubectl port-forward svc/argocd-server -n argocd 8080:443
-
Access ArgoCD at
https://localhost:8080- Username:
admin - Password: (from step 1)
- Username:
Get the Load Balancer URL:
kubectl get ingress -n microtodo api-gateway-ingressThe application will be available at the ADDRESS shown in the output.
infrastructure/
βββ docker-compose.dev.yml # Local development environment
βββ scripts/
β βββ download-alb-policy.sh # Helper script for ALB policy
βββ terraform-bootstrap/ # Terraform state backend setup
β βββ main.tf # S3 bucket and DynamoDB table
βββ terraform/ # Main infrastructure code
β βββ main.tf # AWS provider configuration
β βββ terraform.tf # Terraform and backend configuration
β βββ variables.tf # Input variables
β βββ locals.tf # Local values
β βββ vpc.tf # VPC and networking
β βββ eks.tf # EKS cluster and node groups
β βββ ecr.tf # Container registries
β βββ iam.tf # IAM roles and policies
β βββ secrets.tf # AWS Secrets Manager
β βββ helm-charts.tf # Helm releases (ALB, ArgoCD, External Secrets)
β βββ iam-policies/
β βββ aws-load-balancer-controller-policy.json
βββ k8s/ # Kubernetes manifests
βββ namespaces/
β βββ microtodo.yaml # Application namespace
βββ storage/
β βββ gp3-storageclass.yaml # EBS GP3 storage class
βββ external-secrets/
β βββ secret-store.yaml # AWS Secrets Manager integration
β βββ external-secret.yaml # Secret definitions
βββ databases/
β βββ postgres-users.yaml # Users database
β βββ postgres-tasks.yaml # Tasks database
β βββ postgres-notifications.yaml # Notifications database
β βββ rabbitmq.yaml # Message broker
βββ services/
β βββ api-gateway/
β β βββ deployment.yaml
β β βββ service.yaml
β β βββ configmap.yaml
β βββ users-service/
β β βββ deployment.yaml
β β βββ service.yaml
β β βββ configmap.yaml
β β βββ migrate.yaml # Database migration job
β βββ tasks-service/
β β βββ deployment.yaml
β β βββ service.yaml
β β βββ configmap.yaml
β β βββ migrate.yaml
β βββ notifications-service/
β βββ deployment.yaml
β βββ service.yaml
β βββ configmap.yaml
β βββ migrate.yaml
βββ ingress/
β βββ api-gateway-ingress.yaml # ALB ingress
βββ argocd/
βββ application.yaml # ArgoCD application definition
Follow the steps in Getting Started for initial deployment.
Once ArgoCD is configured:
- Any changes pushed to the
masterbranch in thek8s/directory will be automatically detected by ArgoCD - ArgoCD will sync the changes to the cluster based on the sync policy (automated with pruning and self-healing)
- Monitor deployments in the ArgoCD UI
A typical CI/CD workflow:
-
Build Phase:
- Build Docker images for each service
- Tag images with commit SHA or version
- Push to ECR repositories
-
Update Manifests:
- Update image tags in Kubernetes deployments
- Commit changes to infrastructure repository
-
ArgoCD Sync:
- ArgoCD detects changes
- Applies updates to the cluster
- Verifies deployment health
Example GitHub Actions workflow (pseudo-code):
name: Deploy Service
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Build and push Docker image
# Build image and push to ECR
- name: Update Kubernetes manifests
# Update image tags in infrastructure repo
- name: Wait for ArgoCD sync
# Monitor ArgoCD for successful deployment- Purpose: Single entry point for all client requests, handles routing to microservices
- Port: 80 (internal), exposed via ALB
- Health Checks: TCP-based health probes
- Scaling: 2 replicas (configurable)
- Purpose: User authentication, registration, and profile management
- Port: 3000
- Database: PostgreSQL (postgres-users)
- Features:
- JWT-based authentication
- User CRUD operations
- Password hashing and validation
- Health Checks: TCP-based startup, liveness, and readiness probes
- Purpose: Task management (create, read, update, delete tasks)
- Port: 3001
- Database: PostgreSQL (postgres-tasks)
- Message Queue: RabbitMQ (publishes task events)
- Features:
- Task CRUD operations
- Task assignment and status tracking
- Event publishing for notifications
- Purpose: Handle notifications triggered by task events
- Port: 3002
- Database: PostgreSQL (postgres-notifications)
- Message Queue: RabbitMQ (consumes task events)
- Features:
- Notification creation and delivery
- Event-driven architecture
- Notification history
- VPC Isolation: Services run in private subnets
- Security Groups: (Should be implemented for production)
- NAT Gateway: Controlled internet access for private subnets
- ALB: Public-facing load balancer with optional SSL/TLS
- AWS Secrets Manager: Centralized secret storage
- External Secrets Operator: Syncs secrets to Kubernetes
- IRSA (IAM Roles for Service Accounts): Fine-grained AWS permissions
- Environment Variables: Secrets injected as env vars, never in code
- JWT: Token-based authentication
- Service-to-Service: Internal communication (consider mTLS for production)
β
Secrets stored in AWS Secrets Manager, not in code
β
Terraform state encrypted and stored remotely
β
ECR image scanning (commented out for cost reasons)
β
Database credentials rotated through AWS Secrets Manager
β
Principle of least privilege for IAM roles
β
Private subnets for workloads
- EKS Control Plane Logs: Enabled for API, audit, authenticator, controller manager, and scheduler
- kubectl logs: View container logs
# View service logs
kubectl logs -f deployment/users-service -n microtodo
kubectl logs -f deployment/tasks-service -n microtodo
kubectl logs -f deployment/notifications-service -n microtodo
# View pod events
kubectl describe pod <pod-name> -n microtodo-
Metrics Collection:
- Prometheus for metrics scraping
- Grafana for visualization
- Custom dashboards for business metrics
-
Distributed Tracing:
- Jaeger or AWS X-Ray
- Trace requests across microservices
-
Log Aggregation:
- ELK Stack (Elasticsearch, Logstash, Kibana)
- Fluentd or Fluent Bit for log collection
- CloudWatch Logs Insights
-
Alerting:
- Alertmanager (with Prometheus)
- PagerDuty integration
- Slack notifications
For local development, use the provided Docker Compose file:
docker-compose -f docker-compose.dev.yml up -dThis starts:
- RabbitMQ with management UI (ports 5672, 15672)
Access RabbitMQ Management UI:
- URL: http://localhost:15672
- Default credentials: guest/guest
Validate Kubernetes manifests before applying:
# Dry-run
kubectl apply -f k8s/ --dry-run=client
# Validate with kubeval (if installed)
kubeval k8s/**/*.yaml
### Terraform Development
```bash
cd terraform
# Format code
terraform fmt -recursive
# Validate configuration
terraform validate
# Plan changes
terraform plan
# Apply changes
terraform apply-
Build and push new image to ECR:
# Authenticate to ECR aws ecr get-login-password --region eu-west-2 | docker login --username AWS --password-stdin <account-id>.dkr.ecr.eu-west-2.amazonaws.com # Build and push docker build -t microtodo/users-service:v1.2.3 . docker tag microtodo/users-service:v1.2.3 <account-id>.dkr.ecr.eu-west-2.amazonaws.com/microtodo/users-service:v1.2.3 docker push <account-id>.dkr.ecr.eu-west-2.amazonaws.com/microtodo/users-service:v1.2.3
-
Update the deployment manifest:
spec: containers: - name: users-service image: <account-id>.dkr.ecr.eu-west-2.amazonaws.com/microtodo/users-service:v1.2.3
-
Commit and push (ArgoCD will auto-sync if configured)
- Multi-AZ NAT Gateways: Currently using single NAT Gateway (cost optimization). For production, deploy NAT Gateway per AZ.
- Database High Availability: Consider Amazon RDS for PostgreSQL with Multi-AZ deployment
- RabbitMQ Cluster: Deploy RabbitMQ in cluster mode for HA
- Pod Disruption Budgets: Ensure minimum replicas during voluntary disruptions
- HPA (Horizontal Pod Autoscaler): Auto-scale based on CPU/memory/custom metrics
- CDN: CloudFront for static assets
- Caching: Redis/Memcached for frequently accessed data
- Database Indexing: Optimize database queries with proper indexes
- Connection Pooling: Implement efficient database connection pooling
- Resource Limits: Fine-tune CPU/memory requests and limits
Current cost-saving measures:
- Single NAT Gateway (vs. 3 per AZ)
- Smaller EC2 instance types (t3.small)
- ECR image scanning disabled
- Public EKS endpoint (vs. private only with VPN)
Production cost considerations:
- Use Reserved Instances or Savings Plans for predictable workloads
- Right-size EC2 instances based on actual usage
- Enable cluster autoscaler
- Set up cost alerts and budgets in AWS
- Review and optimize EBS volume sizes
- Backup Strategy: Regular backups of databases and persistent volumes
- Cross-Region Replication: Replicate critical data to another region
- Disaster Recovery Plan: Document and test DR procedures
- RTO/RPO: Define Recovery Time Objective and Recovery Point Objective
- TLS/SSL: Enable HTTPS with valid certificates
- Network Policies: Restrict pod-to-pod communication
- Pod Security Standards: Enforce security contexts
- Image Scanning: Enable ECR vulnerability scanning
- Audit Logging: Comprehensive audit trail
- WAF: Web Application Firewall for ALB
- Penetration Testing: Regular security assessments
- Data Encryption: At-rest and in-transit encryption
- GDPR Compliance: Data privacy and user rights
- Access Logs: Maintain comprehensive access logs
- Regular Audits: Compliance and security audits
Problem: Error acquiring the state lock
Solution:
# Release the lock (use with caution!)
terraform force-unlock <lock-id>Problem: error: You must be logged in to the server (Unauthorized)
Solution:
# Update kubeconfig
aws eks update-kubeconfig --region eu-west-2 --name microtodo_eks_cluster
# Verify AWS CLI credentials
aws sts get-caller-identityProblem: Pods stuck in Pending, CrashLoopBackOff, or ImagePullBackOff
Solution:
# Check pod events
kubectl describe pod <pod-name> -n microtodo
# Check logs
kubectl logs <pod-name> -n microtodo
# Common fixes:
# - Verify ECR authentication
# - Check resource requests/limits
# - Verify secrets are populated
kubectl get secrets -n microtodo
kubectl describe externalsecret microtodo-secrets -n microtodoProblem: Services can't connect to databases
Solution:
# Verify database pods are running
kubectl get pods -n microtodo | grep postgres
# Check database service
kubectl get svc -n microtodo | grep postgres
# Verify secrets
kubectl get secret microtodo-secrets -n microtodo -o yaml
# Test connection from a debug pod
kubectl run -it --rm debug --image=postgres:15 --restart=Never -n microtodo -- psql -h postgres-users-service -U postgresProblem: ArgoCD shows application as OutOfSync
Solution:
# Manual sync via CLI
argocd app sync microtodo-app
# Or via kubectl
kubectl -n argocd patch app microtodo-app -p '{"operation":{"sync":{}}}' --type merge
# Check sync status
argocd app get microtodo-appProblem: Ingress doesn't provision ALB
Solution:
# Check ALB controller logs
kubectl logs -n kube-system deployment/aws-load-balancer-controller
# Verify ingress class
kubectl get ingressclass
# Check IAM permissions for ALB controller
kubectl describe serviceaccount aws-load-balancer-controller -n kube-system# Check cluster health
kubectl get nodes
kubectl get componentstatuses
# View all resources in microtodo namespace
kubectl get all -n microtodo
# Check resource usage
kubectl top nodes
kubectl top pods -n microtodo
# View events
kubectl get events -n microtodo --sort-by='.lastTimestamp'
# Port forward to a service
kubectl port-forward svc/<service-name> -n microtodo <local-port>:<service-port>
# Execute command in pod
kubectl exec -it <pod-name> -n microtodo -- /bin/sh
# View Terraform state
cd terraform
terraform state list
terraform state show <resource>
# ArgoCD password recovery
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d- AWS EKS Documentation
- Terraform AWS Provider
- Kubernetes Documentation
- ArgoCD Documentation
- External Secrets Operator
- AWS Load Balancer Controller
Note: This infrastructure is set up for learning and development purposes. For production deployments, review and implement the security hardening, high availability, and monitoring recommendations outlined in this document.