This repository contains two main components for managing LeRobot training jobs:
- Docker API - A containerized API for running LeRobot training jobs with persistence
- Pod Manager - A service for creating and managing RunPod instances running the Docker API
lerobot-training-api/
├── docker-api/ # Containerized API for running LeRobot training jobs
│ ├── Dockerfile # Docker configuration for LeRobot training environment
│ ├── main.py # FastAPI application for job management
│ ├── job_manager.py # Module for managing training jobs with tmux
│ ├── LICENSE # Apache License 2.0
│ └── README.md # Docker API documentation
│
├── pod-manager/ # Service for managing RunPod instances
│ ├── main.py # FastAPI application for pod management
│ ├── pod_manager.py # Module for RunPod API integration
│ ├── db.py # Database module for persistent storage
│ ├── db/migrations/ # Database migration files
│ ├── .env.example # Example environment configuration
│ ├── .env.dbmate # Database migration configuration
│ └── README.md # Pod Manager documentation
│
└── README.md # This file
The Docker API is a containerized service that:
- Runs LeRobot training jobs in the background using tmux for persistence
- Tracks job progress and logs via JSON files
- Provides REST endpoints for job control and monitoring
- Uses uv for faster package management (instead of conda)
- Includes CUDA support for GPU training
See the Docker API README for detailed documentation.
The Pod Manager is a service that:
- Creates RunPod instances with the LeRobot training Docker image
- Monitors pod status and checks if they're accessible
- Retrieves job status and logs from running pods
- Manages the lifecycle of pods (listing, terminating)
- Uses SQLite for persistent storage
See the Pod Manager README for detailed documentation.
-
Build the Docker image:
cd docker-api docker build -t lerobot-training-api .
-
Run the container:
docker run -p 8000:8000 lerobot-training-api
-
Access the API at http://localhost:8000
-
Set up the Pod Manager:
cd pod-manager pip install -r requirements.txt cp .env.example .env -
Edit the
.envfile to add your RunPod API key and Docker image path. -
Initialize the database:
dbmate up
-
Start the Pod Manager:
./run.sh
-
Access the Pod Manager API at http://localhost:8000
- Use the Pod Manager to create a RunPod instance with the LeRobot training Docker image
- The Pod Manager will monitor the pod and check if it's accessible
- Once the pod is running, use the Docker API endpoints to start and manage training jobs
- The Pod Manager can retrieve job status and logs from the running pod
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.