An AI-powered browser automation microservice built on the Kernel platform that uses browser-use for intelligent web browsing tasks.
The browser-agent microservice provides AI-powered browser automation capabilities, allowing you to control browsers using natural language instructions. It supports multiple LLM providers (Anthropic Claude, OpenAI GPT, Google Gemini, Azure OpenAI, Groq, and Ollama) and can handle complex multi-step web tasks including data extraction, form filling, file downloads, and CAPTCHA solving.
- AI-powered browser automation: Uses LLMs to intelligently control browsers and perform complex web tasks
- Multi-step task execution: Decomposes complex requests into sub-tasks and executes them sequentially
- Multi-provider LLM support: Works with Anthropic Claude, OpenAI GPT, Google Gemini, Azure OpenAI, Groq, and Ollama
- File handling: Automatically downloads PDFs and other files, uploads them to cloud storage
- CAPTCHA solving: Built-in capability to handle CAPTCHAs and similar challenges
- Session management: Creates isolated browser sessions with proper cleanup
- Trajectory tracking: Records and stores complete execution history for analysis
- AI Gateway integration: Compatible with any AI gateway (Cloudflare, Azure, etc.) or direct provider APIs
- mise - Development environment manager
- Python 3.11+ (managed via mise)
- Node.js with
bun
(for deployment tools, managed via mise)
# Install development tools
mise install
# Install Python dependencies
uv sync
# Copy environment template
cp .env.example .env
Edit your .env
file with the required values:
# LLM Provider Configuration
# Option 1: Direct API access (no gateway) - providers use default endpoints
# Nothing required here - providers will use their default API endpoints!
# Option 2: With AI Gateway (Cloudflare example)
AI_GATEWAY_URL="https://gateway.ai.cloudflare.com/v1/{account_id}/ai-gateway"
AI_GATEWAY_HEADERS='{"cf-aig-authorization": "Bearer your-gateway-token"}'
ANTHROPIC_CONFIG='{"base_url": "${AI_GATEWAY_URL}/anthropic", "default_headers": ${AI_GATEWAY_HEADERS}}'
OPENAI_CONFIG='{"base_url": "${AI_GATEWAY_URL}/openai", "default_headers": ${AI_GATEWAY_HEADERS}}'
GEMINI_CONFIG='{"http_options": {"base_url": "${AI_GATEWAY_URL}/google-ai-studio", "headers": ${AI_GATEWAY_HEADERS}}}'
# Option 3: Provider-specific configurations
# Azure OpenAI
AZURE_OPENAI_CONFIG='{"azure_endpoint": "https://your-resource.openai.azure.com/", "api_version": "2024-02-01"}'
# Groq
GROQ_CONFIG='{"base_url": "https://api.groq.com/openai/v1"}'
# Ollama (local)
OLLAMA_CONFIG='{"base_url": "http://localhost:11434/v1"}'
# Kernel Platform (required)
KERNEL_API_KEY="sk_xxxxx"
# S3-compatible storage for file downloads (required)
S3_BUCKET="browser-agent"
S3_ACCESS_KEY_ID="your-access-key"
S3_ENDPOINT_URL="https://{account_id}.r2.cloudflarestorage.com"
S3_SECRET_ACCESS_KEY="your-secret-key"
# Optional Configuration
# Browser viewport size (default: 1440x900)
# VIEWPORT_SIZE='{"width": 1440, "height": 900}'
# Set to "debug" for verbose browser-use logging
# BROWSER_USE_LOGGING_LEVEL="info"
# Set to "false" to disable anonymous telemetry
# ANONYMIZED_TELEMETRY="false"
Test that everything is working:
# Start the development server
just dev
# In another terminal, check the service is running
curl http://localhost:8080/health
POST /apps/browser-agent/actions/perform
{
"input": "Task description for the browser agent",
"provider": "anthropic|gemini|openai|azure_openai|groq|ollama",
"model": "claude-3-5-sonnet-20241022|gpt-4o|gemini-2.0-flash-exp|llama-3.3-70b-versatile",
"api_key": "your-llm-api-key",
"instructions": "Optional additional instructions",
"stealth": true,
"headless": false,
"browser_timeout": 60,
"max_steps": 100,
"reasoning": true,
"flash": false
}
input
(required): Natural language description of the task to performprovider
(required): LLM provider ("anthropic"
,"gemini"
,"openai"
,"azure_openai"
,"groq"
, or"ollama"
)model
(required): Specific model to use (e.g.,"claude-3-sonnet-20240229"
)api_key
(required): API key for the LLM providerinstructions
(optional): Additional context or constraints for the taskstealth
(optional): Enable stealth mode to avoid detection (default:true
)headless
(optional): Run browser in headless mode (default:false
)browser_timeout
(optional): Browser session shutdown timeout in seconds (default: 60)max_steps
(optional): Maximum number of automation steps (default: 100)reasoning
(optional): Enable step-by-step reasoning (default:true
)flash
(optional): Use faster execution mode (default:false
)
{
"session": "browser-session-id",
"success": true,
"duration": 45.2,
"result": "Task completion summary",
"downloads": {
"filename.pdf": "https://presigned-url",
"data.csv": "https://presigned-url"
}
}
session
: Unique browser session identifiersuccess
: Whether the task completed successfullyduration
: Execution time in secondsresult
: Summary of what was accomplisheddownloads
: Dictionary of downloaded files with presigned URLs
{
"input": "Go to example.com and extract all the text content from the main article",
"provider": "anthropic",
"model": "claude-4-sonnet",
"api_key": "sk-ant-xxxxx",
"headless": true,
"max_steps": 50
}
{
"input": "Search for Python tutorials on Google and download the first PDF result",
"instructions": "Make sure to verify the PDF is relevant before downloading",
"provider": "openai",
"model": "gpt-4.1",
"api_key": "sk-xxxxx",
"headless": false,
"reasoning": true
}
{
"input": "Fill out the contact form on example.com with name 'John Doe', email '[email protected]', and message 'Hello world'",
"provider": "gemini",
"model": "gemini-2.0-flash-exp",
"api_key": "your-gemini-key",
"stealth": true
}
{
"input": "Navigate to news.ycombinator.com and summarize the top 5 stories",
"provider": "azure_openai",
"model": "gpt-4o",
"api_key": "your-azure-openai-key",
"headless": true
}
{
"input": "Search for 'climate change' on Wikipedia and extract the first paragraph",
"provider": "groq",
"model": "llama-3.3-70b-versatile",
"api_key": "your-groq-key",
"reasoning": true
}
{
"input": "Go to example.com and take a screenshot of the homepage",
"provider": "ollama",
"model": "llama3.2",
"api_key": "not-required-for-ollama",
"headless": false
}
This project uses just as a task runner. All commands are defined in the justfile
.
just dev # Run local development server on port 8000
just fmt # Format and lint code with ruff (auto-fix issues)
just lint # Check code formatting and linting (no auto-fix)
just deploy # Deploy main.py to Kernel platform
just logs # View browser-agent logs with follow mode
just claude # Run Claude Code CLI (setup and development assistant)
just gemini # Run Google Gemini CLI
just kernel <cmd> # Run any Kernel CLI command (e.g., 'just kernel status')
The deployment process:
- Runs formatting and linting checks
- Deploys
src/app.py
to the Kernel platform - Service becomes available at the configured Kernel endpoint
src/app.py
: Main Kernel app withbrowser-agent
action. Creates browsers via kernel, instantiates Agent with custom session, runs tasks and returns trajectory results.src/lib/browser/session.py
: CustomBrowserSession that extends browser-use's BrowserSession, fixing viewport handling for CDP connections and setting fixed 1024x786 resolution.src/lib/browser/models.py
: BrowserAgentRequest model handling LLM provider abstraction (anthropic, gemini, openai, azure_openai, groq, ollama) with AI gateway integration.src/lib/gateway.py
: AI gateway configuration from environment variables.
browser-use>=0.7.2
- Web automation library providing Agent and BrowserSessionkernel>=0.11.0
- Platform for running the browser agent servicezenbase-llml>=0.4.0
- LLM templating used in task constructionpydantic>=2.10.6
- Data validation and serializationboto3>=1.40.25
- AWS S3/R2 integration for file storage
- Request received via Kernel platform
- LLM client created based on provider/model (direct API or through AI Gateway)
- Remote browser session established with custom configuration
- browser-use Agent instantiated with reasoning capabilities
- Task executed with intelligent planning and step-by-step execution
- Files automatically uploaded to Cloudflare R2 storage
- Trajectory and results returned with download links
- Environment variables: Ensure all required environment variables are set
- Browser timeout: Increase
browser_timeout
for complex tasks - File downloads: Check R2 bucket permissions and configuration
- LLM provider errors: Verify API keys and model availability
- Deployment issues: Ensure that the main entrypoint is in the root of the directory
- Format code:
just fmt
- Test changes locally:
just dev
- Deploy to staging:
just deploy