A minimal, production-ready implementation of Andrej Karpathy's Agent Operating System architecture, developed by Swarms.ai and partners.
AgentOS is a lightweight, single-file implementation that provides a robust foundation for building autonomous AI agents. It implements the core concepts outlined in Karpathy's Agent OS architecture while maintaining simplicity and extensibility. Developed by Swarms.ai and its partners, AgentOS is a production-ready implementation of autonomous AI agents that follows the architectural principles outlined by Andrej Karpathy.
- Unified Model Interface: Seamless integration with multiple LLM providers through LiteLLM
- Support for Anthropic Claude models (Opus, Sonnet, Haiku)
- Integration with OpenAI GPT models
- Access to optimized variants (GPT-4o, GPT-4o-mini)
- Browser Automation: Built-in browser agent capabilities for web interaction using browser-use
- Multi-Modal Support:
- Text processing and generation
- Video analysis through Google's Gemini models
- Audio processing and speech synthesis
- Image handling capabilities
- Resource Management:
- Efficient handling of computational resources
- Dynamic model selection based on task requirements
- Automatic GPU/CPU optimization
- HuggingFace Integration:
- Direct access to open-source models
- Support for text generation and multiple NLP tasks
- Automatic model quantization and optimization
- Extensible Architecture: Easy to add new capabilities and tools
- Model Management: Dynamic selection and utilization of language models
- Browser Automation: Autonomous web-based task execution
- Resource Orchestration: Efficient management of computational resources
- Context Management: Maintains system state and task dependencies
pip3 install -U agentos-sdk
from agentos_sdk import AgentOS
from dotenv import load_dotenv
load_dotenv()
agent = AgentOS(plan_on=False, max_loops=1)
agent.run(
"Generate a video of a cat surfing on a wave at sunset, cinematic style. Save it as 'cat_surfing.mp4. We should also add cat sounds and meowing sounds."
)
AgentOS comes with a powerful set of built-in tools that enable various capabilities. Here's a comprehensive list of all available tools:
Tool Name | Description | Use Case Examples |
---|---|---|
Browser Agent | Autonomous web browser automation tool that can navigate websites, extract information, and perform web-based tasks | - Web scraping - Form filling - Data extraction - Website testing |
Hugging Face Model | Interface for using various Hugging Face models for text generation and other NLP tasks | - Text generation - Language translation - Text classification - Custom model inference |
LiteLLM Model | Unified interface for multiple LLM providers including OpenAI, Anthropic, and others | - Text generation - Chat completion - Content creation - Advanced reasoning |
Safe Calculator | Secure mathematical expression evaluator with built-in safety checks | - Mathematical calculations - Formula evaluation - Secure computation - Numeric processing |
Terminal Developer Agent | Advanced agent for performing terminal operations and development tasks | - File operations - Code execution - System commands - Development tasks |
Generate Speech | Text-to-speech conversion tool supporting multiple voices and models | - Audio content creation - Voice synthesis - Accessibility features - Audio narration |
Generate Video | AI-powered video generation tool using Google's Veo 3.0 model | - Video content creation - Visual storytelling - Animation generation - Creative content |
Create Files | Tool for creating new files in the workspace with specified content | - Document creation - Code file generation - Report writing - Configuration files |
Update Files | Tool for updating existing files by overwriting their content | - Content modification - File updates - Document revisions - Configuration changes |
Join our community of agent engineers and researchers for technical support, cutting-edge updates, and exclusive access to world-class agent engineering insights!
Platform | Description | Link |
---|---|---|
📚 Documentation | Official documentation and guides | docs.swarms.world |
📝 Blog | Latest updates and technical articles | Medium |
💬 Discord | Live chat and community support | Join Discord |
Latest news and announcements | @kyegomez | |
Professional network and updates | The Swarm Corporation | |
📺 YouTube | Tutorials and demos | Swarms Channel |
🎫 Events | Join our community events | Sign up here |
🚀 Onboarding Session | Get onboarded with Kye Gomez, creator and lead maintainer of Swarms | Book Session |
We welcome contributions from the community. Please see our contributing guidelines for more information.
This project is under the MIT License.
- Add deep research agent or sub agent
- Implement video and audio processing
- Create better system prompt and add multiple shot examples on when to use certain tools and etc