This project deploys Ollama and Open WebUI on AWS ECS with GPU support, allowing you to run large language models locally with a user-friendly interface.
The infrastructure consists of:
- ECS cluster running on EC2 instances with GPU support (g4dn.xlarge)
- Ollama service for running LLMs
- Open WebUI service for the user interface
- Application Load Balancer for routing traffic
- Service Connect for service discovery
- CloudWatch for logging
graph TD
Internet[Internet] --> ALB[Application Load Balancer]
subgraph "Public Subnets"
ALB
end
subgraph "Private Subnets"
subgraph "ECS Cluster"
EC2[EC2 with GPU]
subgraph "Ollama Service"
OC[Ollama Container]
end
subgraph "WebUI Service"
WC[WebUI Container]
end
end
end
ALB -- "Default Route" --> WC
ALB -- "Host Header: api.*, ollama.*" --> OC
WC -- "Service Connect: ollama.internal:11434" --> OC
style EC2 fill:#f9d,stroke:#333,stroke-width:2px
style OC fill:#bbf,stroke:#333,stroke-width:1px
style WC fill:#bfb,stroke:#333,stroke-width:1px
- GPU Acceleration: Uses g4dn.xlarge instances with NVIDIA T4 GPUs
- Secure Architecture: Services run in private subnets with public access through ALB
- Service Discovery: Uses ECS Service Connect for internal communication
- Logging: CloudWatch logs with 1-day retention
- AWS Account
- VPC with public and private subnets
- NAT Gateway for outbound internet access from private subnets
- Terraform installed
-
Clone this repository
-
Update
terraform.tfvars
with your configuration -
Initialize Terraform:
terraform init
-
Apply the configuration:
terraform apply
After deployment, you can access the WebUI using the URL provided in the Terraform outputs:
terraform output webui_url
To load a model, you can use the provided curl command:
terraform output model_pull_command
This will pull the DeepSeek-R1 7B model. You can modify this command to pull other models.
The EC2 instances are configured to use AWS Systems Manager Session Manager for secure shell access without requiring public IP addresses or opening SSH ports.
-
Install the AWS CLI and Session Manager plugin:
# Install AWS CLI curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip sudo ./aws/install # Install Session Manager plugin curl "https://s3.amazonaws.com/session-manager-downloads/plugin/latest/mac/sessionmanager-bundle.zip" -o "sessionmanager-bundle.zip" unzip sessionmanager-bundle.zip sudo ./sessionmanager-bundle/install -i /usr/local/sessionmanagerplugin -b /usr/local/bin/session-manager-plugin
-
Configure AWS CLI with appropriate credentials:
aws configure
-
Add the following to your
~/.ssh/config
file to enable SSH access via SSM:# SSH over Session Manager Host i-* mi-* ProxyCommand sh -c "aws ssm start-session --target %h --document-name AWS-StartSSHSession --parameters 'portNumber=%p'" StrictHostKeyChecking no User ec2-user IdentityFile ~/.ssh/id_rsa # Named instance example (replace INSTANCE_ID with your actual instance ID) Host ollama-gpu HostName INSTANCE_ID ProxyCommand sh -c "aws ssm start-session --target %h --document-name AWS-StartSSHSession --parameters 'portNumber=%p'" StrictHostKeyChecking no User ec2-user IdentityFile ~/.ssh/id_rsa # You can also specify a specific AWS profile if needed Host ollama-gpu-prod HostName INSTANCE_ID ProxyCommand sh -c "aws --profile production ssm start-session --target %h --document-name AWS-StartSSHSession --parameters 'portNumber=%p'" StrictHostKeyChecking no User ec2-user IdentityFile ~/.ssh/id_rsa
With this configuration, you can simply use
ssh ollama-gpu
to connect to your instance.
-
List available instances:
aws ec2 describe-instances --filters "Name=tag:Name,Values=ecs-gpu-instance" --query "Reservations[*].Instances[*].[InstanceId,State.Name,Tags[?Key=='Name'].Value|[0]]" --output table
-
Start an SSM session:
aws ssm start-session --target INSTANCE_ID
-
For port forwarding (if needed):
aws ssm start-session --target INSTANCE_ID --document-name AWS-StartPortForwardingSession --parameters "portNumber"=["8080"],"localPortNumber"=["8080"]
Once connected to the instance, you can troubleshoot ECS:
# Check ECS agent status
sudo systemctl status ecs
# View ECS agent logs
sudo journalctl -u ecs
# Check Docker status
sudo systemctl status docker
# List Docker containers
sudo docker ps
# Check GPU status
nvidia-smi
# View ECS container logs
sudo tail -f /var/log/ecs/ecs-agent.log
- All services run in private subnets
- No direct SSH access to instances (SSM used instead)
- Security groups restrict traffic flow
- Load balancer is the only component exposed to the internet
To update or add new models:
curl -X POST http://<ALB_DNS>/api/pull -d '{"name": "MODEL_NAME"}'
To adjust the number of instances:
# Update min, max, and desired capacity
aws autoscaling update-auto-scaling-group --auto-scaling-group-name ecs-gpu-asg --min-size 2 --max-size 4 --desired-capacity 2
- GPU not detected: Check NVIDIA driver installation with
nvidia-smi
- Services not starting: Check ECS service events in AWS Console
- Cannot connect to WebUI: Verify security groups and load balancer health checks
# View WebUI logs
aws logs get-log-events --log-group-name /ecs/webui-service --log-stream-name <LOG_STREAM>
# View Ollama logs
aws logs get-log-events --log-group-name /ecs/ollama-service --log-stream-name <LOG_STREAM>
- g4dn.xlarge instances cost approximately $0.526/hour
- Consider stopping instances when not in use
- CloudWatch logs are configured with 1-day retention to minimize storage costs
This project is licensed under the MIT License - see the LICENSE file for details.