Skip to content

Intelligent robotic task planning and execution: Gemini decomposes natural-language goals, OpenCV tracks blocks, and ROS controls the arm with point-cloud perception.

Notifications You must be signed in to change notification settings

aminmomin2/FRI-Task-Planning

Repository files navigation

FRI-Task-Planning

A comprehensive robotic task planning and execution system that combines computer vision, AI-powered task decomposition, and ROS-based robot control for automated block manipulation tasks.

🎯 Project Overview

This project implements an intelligent robotic system capable of:

  • Computer Vision Block Tracking: Real-time detection and tracking of colored blocks using OpenCV
  • AI-Powered Task Planning: Using Google's Gemini AI to decompose high-level tasks into executable robot actions
  • ROS-Based Robot Control: Coordinated robot arm control for pick-and-place operations
  • 3D Point Cloud Processing: Depth perception and spatial understanding for precise manipulation

πŸ—οΈ System Architecture

The system consists of several interconnected modules:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Task Input    │───▢│  Gemini AI      │───▢│  Task Executor  β”‚
β”‚   (User)        β”‚    β”‚  Decomposition  β”‚    β”‚  (ROS Node)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚                       β”‚
                                β–Ό                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Block Tracker  │───▢│  Point Cloud    │───▢│  Robot Controllerβ”‚
β”‚  (OpenCV)       β”‚    β”‚  Transformer    β”‚    β”‚  (ROS Node)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Project Components

πŸ€– Core Modules

1. Task Planning & AI Integration

  • gemini-api.py: Interfaces with Google's Gemini AI to decompose high-level tasks into robot-executable subtasks
  • config.py: Manages API keys and configuration settings
  • task_executor.py: ROS node that receives and executes task plans

2. Computer Vision & Block Tracking

  • block_tracking.py: Standalone OpenCV-based block detection and tracking
  • ros_block_tracker.py: ROS-integrated version of block tracking
  • centroid_tracker.py: Object tracking algorithm for maintaining block IDs across frames

3. 3D Perception & Spatial Understanding

  • listener.py: Camera data processing and point cloud generation
  • point_cloud_transformer.py: Coordinate frame transformations for robot integration

4. Robot Control

  • robot_controller.py: High-level robot arm control and pick-and-place operations

πŸš€ Features

🎨 Multi-Color Block Detection

  • Detects blocks in 7 different colors: red, orange, yellow, green, blue, purple
  • Robust HSV color space filtering with morphological operations
  • Real-time tracking with unique ID assignment

🧠 Intelligent Task Decomposition

  • Uses Gemini 2.0 Flash Thinking model for task planning
  • Converts natural language commands into structured robot actions
  • Supports "Pick" and "Place" primitive skills
  • Generates JSON-formatted task plans

πŸ“Š Advanced Object Tracking

  • Centroid-based object tracking with persistence
  • Movement detection and visualization
  • Trail visualization for motion analysis
  • Robust handling of occlusions and temporary disappearances

πŸ€– ROS Integration

  • Full ROS ecosystem integration
  • Point cloud processing and transformation
  • Robot arm pose control
  • Real-time sensor data processing

πŸ› οΈ Installation & Setup

Prerequisites

# Python dependencies
pip install opencv-python numpy imutils scipy google-genai python-dotenv

# ROS dependencies (if using ROS)
sudo apt-get install ros-noetic-cv-bridge ros-noetic-tf2-ros
pip install rospkg

Environment Setup

  1. Create a .env file in the project root:
GEMINI_API_KEY=your_gemini_api_key_here
  1. Install the required Python packages:
pip install -r requirements.txt

πŸ“– Usage

1. Basic Block Tracking

# Standalone block tracking with webcam
python block_tracking.py

# With video file
python block_tracking.py --video path/to/video.mp4

2. AI Task Planning

# Interactive task planning
python gemini-api.py
# Enter task: "Stack the red block on top of the blue block"

3. ROS-Based System

# Terminal 1: Start ROS core
roscore

# Terminal 2: Start camera listener
python listener.py

# Terminal 3: Start block tracker
python ros_block_tracker.py

# Terminal 4: Start point cloud transformer
python point_cloud_transformer.py

# Terminal 5: Start robot controller
python robot_controller.py

# Terminal 6: Start task executor
python task_executor.py

πŸ”§ Configuration

Block Detection Parameters

  • Color Ranges: HSV thresholds for each color in block_tracking.py
  • Block Size: MIN_BLOCK_AREA and MAX_BLOCK_AREA for size filtering
  • Movement Threshold: movement_threshold for motion detection

Robot Parameters

  • Gripper Control: gripper_open and gripper_close positions
  • Pick/Place Heights: pick_height and place_height above surfaces
  • Movement Delays: Timing for robot arm movements

AI Configuration

  • Model: Gemini 2.0 Flash Thinking (configurable in gemini-api.py)
  • Skills: Currently supports "Pick" and "Place" (extensible)
  • Output Format: JSON array of subtask objects

πŸ“Š Output Formats

Task Plan JSON

[
  {
    "subtask": "Pick the red block",
    "skill": "Pick"
  },
  {
    "subtask": "Place the red block on top of the blue block",
    "skill": "Place"
  }
]

Block Tracking Data

  • Real-time block positions and IDs
  • Movement status and trails
  • Color classification results

πŸ” Troubleshooting

Common Issues

  1. Camera not detected: Check camera permissions and device connections
  2. Color detection issues: Adjust HSV ranges in color_ranges dictionary
  3. ROS connection errors: Ensure ROS core is running and topics are published
  4. API key errors: Verify Gemini API key in .env file

Performance Optimization

  • Reduce frame resolution for faster processing
  • Adjust block size thresholds for your environment
  • Use GPU acceleration for OpenCV operations

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

πŸ™ Acknowledgments

  • Google Gemini AI for task planning capabilities
  • OpenCV community for computer vision tools
  • ROS community for robotics framework
  • FRI (Friendly Robotics Initiative) for project inspiration

About

Intelligent robotic task planning and execution: Gemini decomposes natural-language goals, OpenCV tracks blocks, and ROS controls the arm with point-cloud perception.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages