A comprehensive robotic task planning and execution system that combines computer vision, AI-powered task decomposition, and ROS-based robot control for automated block manipulation tasks.
This project implements an intelligent robotic system capable of:
- Computer Vision Block Tracking: Real-time detection and tracking of colored blocks using OpenCV
- AI-Powered Task Planning: Using Google's Gemini AI to decompose high-level tasks into executable robot actions
- ROS-Based Robot Control: Coordinated robot arm control for pick-and-place operations
- 3D Point Cloud Processing: Depth perception and spatial understanding for precise manipulation
The system consists of several interconnected modules:
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Task Input βββββΆβ Gemini AI βββββΆβ Task Executor β
β (User) β β Decomposition β β (ROS Node) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Block Tracker βββββΆβ Point Cloud βββββΆβ Robot Controllerβ
β (OpenCV) β β Transformer β β (ROS Node) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
gemini-api.py
: Interfaces with Google's Gemini AI to decompose high-level tasks into robot-executable subtasksconfig.py
: Manages API keys and configuration settingstask_executor.py
: ROS node that receives and executes task plans
block_tracking.py
: Standalone OpenCV-based block detection and trackingros_block_tracker.py
: ROS-integrated version of block trackingcentroid_tracker.py
: Object tracking algorithm for maintaining block IDs across frames
listener.py
: Camera data processing and point cloud generationpoint_cloud_transformer.py
: Coordinate frame transformations for robot integration
robot_controller.py
: High-level robot arm control and pick-and-place operations
- Detects blocks in 7 different colors: red, orange, yellow, green, blue, purple
- Robust HSV color space filtering with morphological operations
- Real-time tracking with unique ID assignment
- Uses Gemini 2.0 Flash Thinking model for task planning
- Converts natural language commands into structured robot actions
- Supports "Pick" and "Place" primitive skills
- Generates JSON-formatted task plans
- Centroid-based object tracking with persistence
- Movement detection and visualization
- Trail visualization for motion analysis
- Robust handling of occlusions and temporary disappearances
- Full ROS ecosystem integration
- Point cloud processing and transformation
- Robot arm pose control
- Real-time sensor data processing
# Python dependencies
pip install opencv-python numpy imutils scipy google-genai python-dotenv
# ROS dependencies (if using ROS)
sudo apt-get install ros-noetic-cv-bridge ros-noetic-tf2-ros
pip install rospkg
- Create a
.env
file in the project root:
GEMINI_API_KEY=your_gemini_api_key_here
- Install the required Python packages:
pip install -r requirements.txt
# Standalone block tracking with webcam
python block_tracking.py
# With video file
python block_tracking.py --video path/to/video.mp4
# Interactive task planning
python gemini-api.py
# Enter task: "Stack the red block on top of the blue block"
# Terminal 1: Start ROS core
roscore
# Terminal 2: Start camera listener
python listener.py
# Terminal 3: Start block tracker
python ros_block_tracker.py
# Terminal 4: Start point cloud transformer
python point_cloud_transformer.py
# Terminal 5: Start robot controller
python robot_controller.py
# Terminal 6: Start task executor
python task_executor.py
- Color Ranges: HSV thresholds for each color in
block_tracking.py
- Block Size:
MIN_BLOCK_AREA
andMAX_BLOCK_AREA
for size filtering - Movement Threshold:
movement_threshold
for motion detection
- Gripper Control:
gripper_open
andgripper_close
positions - Pick/Place Heights:
pick_height
andplace_height
above surfaces - Movement Delays: Timing for robot arm movements
- Model: Gemini 2.0 Flash Thinking (configurable in
gemini-api.py
) - Skills: Currently supports "Pick" and "Place" (extensible)
- Output Format: JSON array of subtask objects
[
{
"subtask": "Pick the red block",
"skill": "Pick"
},
{
"subtask": "Place the red block on top of the blue block",
"skill": "Place"
}
]
- Real-time block positions and IDs
- Movement status and trails
- Color classification results
- Camera not detected: Check camera permissions and device connections
- Color detection issues: Adjust HSV ranges in
color_ranges
dictionary - ROS connection errors: Ensure ROS core is running and topics are published
- API key errors: Verify Gemini API key in
.env
file
- Reduce frame resolution for faster processing
- Adjust block size thresholds for your environment
- Use GPU acceleration for OpenCV operations
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
- Google Gemini AI for task planning capabilities
- OpenCV community for computer vision tools
- ROS community for robotics framework
- FRI (Friendly Robotics Initiative) for project inspiration