This is a helpful agent that can control your computer based on your commands. Here are some things it can do:
- Open websites and search online: Just tell it the website you want to visit or what you want to search for.
- Open applications: You can ask it to open apps like Notepad, Chrome, or other applications on your system.
- Get system information: Ask about your computer's CPU usage, memory, etc.
- Screen analysis: The agent can analyze what's on your screen, recognize elements, and interact with applications based on visual content.
- Write and fix code: It can generate code in different languages and help fix errors in your code when you're in VSCode.
- Generate PowerPoint presentations: Create professional presentations automatically with modern designs and save them to your Downloads folder.
- Write stories and content: Ask it to write stories or other text content for you.
- Play music/videos on YouTube: Tell it what you want to watch or listen to on YouTube.
- Schedule tasks: Set up reminders or schedule the agent to perform tasks at specific times.
- Send emails: Compose and send professional emails with proper formatting and attachments.
- Multitasking capabilities: The agent can handle multiple tasks simultaneously.
- Remember context and preferences: With contextual memory, the agent remembers your past interactions and builds a memory of facts about you.
- System control: Adjust volume and screen brightness, get video information, and create YouTube playlists.
To get started with the agent, follow these steps:
-
Clone the repository (if applicable) or navigate to the project directory.
-
Install dependencies: Make sure you have Python installed. Then install the required packages using the
requirements.txt
file:pip install -r requirements.txt
-
Set up your API Key: You'll need a Google Generative AI API key. Create a
.env
file in the project root and add your API key like this:GOOGLE_API_KEY=YOUR_API_KEY_HERE
Replace
YOUR_API_KEY_HERE
with your actual API key. -
Configure Email Settings: Add your email configuration to the
.env
file:SMTP_SERVER=smtp.gmail.com SMTP_PORT=587 [email protected] SENDER_PASSWORD=your_app_password_here
Note: For Gmail users, you'll need to use an App Password. Enable 2-Step Verification in your Google Account and generate an App Password for "Mail".
-
Run the agent: You can run the agent in different modes using the
agent_cli.py
script oradk web
for the web interface.-
Text Mode: Interact with the agent using text commands in your terminal.
python agent_cli.py text
-
Voice Mode: Interact with the agent using your voice (requires microphone setup).
python agent_cli.py voice
-
Web Mode: Access the agent through a web-based user interface (requires ADK setup).
adk web
(Note: Make sure you are in the correct directory and have followed the ADK setup instructions if using
adk web
.)
-
# Volume control
set volume to 75
what's the current volume
# Screen brightness
set brightness to 80
what's the current brightness
# Get video information
get info for https://www.youtube.com/watch?v=VIDEO_ID
# Create playlist
create playlist named "My Favorites" with https://www.youtube.com/watch?v=VIDEO1 https://www.youtube.com/watch?v=VIDEO2
# Search and play videos
search pokemon on youtube
youtube cooking recipes
The agent can help you compose and send professional emails with proper formatting:
# Send a simple email
email [email protected] that I can't come tomorrow due to bad health
# Send an email with a subject
email [email protected] about "Meeting Cancellation" that I need to reschedule our meeting
# Send an email with attachments
email [email protected] about "Project Report" that Here's the latest report with file "report.pdf"
# Send a formatted email with multiple attachments
email [email protected] about "Weekly Update" that Please find attached the weekly reports with files "report1.pdf" "report2.xlsx"
# Analyze current screen
analyze screen
# Find specific elements
find button on screen
find text "Submit" on screen
# Schedule a task
schedule 'check email' at '3pm tomorrow'
# Schedule a recurring task
schedule 'backup files' at '2am' repeat 'daily'
# List scheduled tasks
list tasks
# Remove a task
remove task 1
# Generate Python code
write python code for a simple calculator
# Generate HTML page
write html page for a portfolio website
# Open applications
open notepad
open chrome
# Search web
search for python tutorials
search youtube for cooking recipes
# Get system info
system info
show cpu usage
# Control volume and brightness
set volume to 50
set brightness to 75
# Create a presentation with default settings
create presentation about artificial intelligence
# Create a presentation with a specific title
make presentation titled "Business Strategy" about marketing
# Specify number of slides
generate powerpoint about climate change with 10 slides
The presentation generator includes:
- Modern slide designs and layouts
- Automatic content generation based on the topic
- Professional formatting and styling
- Images relevant to the content
- Automatic saving to your Downloads folder
- Interactive viewing options (automatically opens when requested)
Presentation commands:
create presentation about [topic]
- Generate a presentation about any topicmake presentation titled [title] about [topic]
- Create with specific titlegenerate powerpoint about [topic] with [number] slides
- Specify slide countyes
- Open the presentation after generation
Features:
- Presentations are automatically saved to your Downloads folder
- Modern and professional slide designs
- Comprehensive content generation
- Relevant images and visual elements
- Interactive viewing experience
The agent provides advanced system control capabilities:
- Volume Control: Adjust system volume levels (0-100%)
- Screen Brightness: Control screen brightness (0-100%)
- YouTube Integration:
- Get detailed video information (title, views, upload date)
- Create and manage playlists
- Search and play videos
- Get video statistics
System control commands:
set volume to [0-100]
- Adjust system volumeset brightness to [0-100]
- Adjust screen brightnessget info for [youtube_url]
- Get video informationcreate playlist named [name] with [video_urls]
- Create YouTube playlist
The agent provides advanced email capabilities:
- Professional Formatting: Automatically formats emails with proper greetings and sign-offs
- Smart Subject Generation: Generates appropriate subject lines based on email content
- Attachment Support: Can attach multiple files of various types (PDF, DOC, images, etc.)
- Context-Aware: Generates appropriate email content based on the context
- Multiple Email Providers: Supports Gmail, Outlook, Yahoo, and other SMTP servers
Email commands:
email [address] that [message]
- Send a simple emailemail [address] about [subject] that [message]
- Send an email with subjectemail [address] about [subject] that [message] with file [filename]
- Send email with attachment
The agent has powerful screen analysis features that allow it to:
- Understand screen content: Using computer vision and OCR technologies, the agent can "see" and understand what's on your screen.
- Recognize UI elements: Identify buttons, text fields, icons, and other UI components.
- Read text from screen: Extract and process text from any visible application using Tesseract OCR.
- Analyze code editors: Particularly useful in environments like VSCode, where it can understand code context.
- Identify applications: Recognize which applications are open and visible.
- Assist with visual tasks: Help you find elements on screen or guide you through complex interfaces.
Screen analysis commands:
analyze screen
- Analyze what's currently visible on your screenfind [element] on screen
- Look for a specific UI element or textdescribe what you see
- Get a description of the current screen content
The agent supports multitasking, allowing you to:
- Run multiple tasks simultaneously: The agent can handle several operations at once.
- Schedule tasks for later: Set reminders or schedule actions to occur at specific times.
- Manage scheduled tasks: View, modify, or cancel previously scheduled tasks.
Task scheduling commands:
schedule [task] at [time]
- Schedule a task to run at a specific timeshow scheduled tasks
- View all currently scheduled taskscancel task [task name or number]
- Cancel a scheduled task
The agent is equipped with a contextual memory system that allows it to:
- Remember your conversations: The agent stores recent conversations to provide more context-aware responses.
- Learn about you: The agent automatically extracts facts about you from your conversations (like your name, interests, and preferences).
- Personalize responses: Over time, the agent tailors its responses based on what it knows about you.
- Maintain topic awareness: The agent can remember information about topics you've discussed before.
Memory commands:
show memory
orshow what you remember
- See what the agent has stored in its memoryclear memory
- Reset the agent's memory if needed
The contextual memory is saved between sessions, so the agent will remember you even after restarting.
Just type your command or question, and the agent will try its best to understand and help you! The more you interact with it, the better it will understand your preferences and needs.
Common commands:
open [website or application]
- Opens a website or installed applicationsearch for [query]
- Searches the web for your querywrite [content type] about [topic]
- Generates written contentsystem info
- Shows information about your computer systememail [address] that [message]
- Sends an emailschedule [task] at [time]
analyze screen
- Analyzes current screen content