CTF LLM Agent Boilerplate

This project provides a Docker-based framework for developing and evaluating Large Language Model (LLM) agents for Capture The Flag (CTF) competitions. It supports both file-based challenges and complex network-based challenges with multiple containerized services.

Features

Docker-based execution: Complete isolation and reproducible environments
Multi-service challenges: Deploy web applications, databases, and other services
Network-aware agents: Automatic detection of network vs file-based challenges
LLM cost tracking: Full observability of API usage and costs
Automated evaluation: Batch testing across multiple challenges

Quickstart

1. Prerequisites

Docker: Required for containerized challenge execution
uv: Package manager for Python dependencies

Install uv if you don't have it: https://docs.astral.sh/uv/getting-started/installation/

2. Installation

Install dependencies:

uv sync

3. Configuration

The agent requires access to an LLM through a LiteLLM-compatible API.

Copy the example environment file:
```
cp .env.example .env
```

Edit the .env file with your LiteLLM endpoint and API key:

LITELLM_BASE_URL="https://your-litellm-proxy-url.com"
LITELLM_API_KEY="your-litellm-api-key"

4. Running the Evaluation

All challenges now run in Docker containers with automatic resource management.

To evaluate against all challenges:

uv run eval_agent.py

To run a single challenge:

uv run eval_agent.py --challenge <challenge_name>

Examples:

uv run eval_agent.py --challenge baby_cat              # File-based challenge
uv run eval_agent.py --challenge easy_sql_injection    # Network-based challenge

Results are saved in eval_results/ with detailed logs, costs, and LLM request tracking.

Project Structure

.
├── agent/
│   └── agent.py           # Main agent with network/file challenge detection
├── challenges/
│   ├── baby_cat/          # File-based challenge example
│   │   ├── artifacts/
│   │   │   └── myfile.txt
│   │   └── challenge.json
│   ├── easy_sql_injection/ # Network-based challenge example
│   │   ├── docker/         # Service container definitions
│   │   │   ├── Dockerfile
│   │   │   └── ...
│   │   ├── artifacts/
│   │   └── challenge.json
│   └── ...
├── docker/
│   └── agent/             # Agent container configuration
│       ├── Dockerfile     # Agent execution environment
│       └── run_agent.py   # Container entry point
├── eval_results/          # Timestamped evaluation results
├── helper/
│   ├── agent_boilerplate.py # Agent interface definition
│   ├── ctf_challenge.py   # Challenge models with service support
│   ├── docker_manager.py  # Docker orchestration and networking
│   └── llm_helper.py      # LLM integration with cost tracking
├── .env                   # Environment configuration (API keys)
├── eval_agent.py          # Main evaluation orchestrator
└── README.md              # This file

How to...

Implement a Custom Agent

Open agent/agent.py.
The file contains a SimpleAgent class that implements the AgentInterface.
Modify the solve_challenge method to implement your own strategy. The agent automatically detects:
- File-based challenges: Access via challenge.working_folder with artifacts
- Network-based challenges: Access via challenge.network_info with service discovery
Use the CTFChallengeClient object for challenge interaction and flag submission.

Create a File-Based Challenge

Create a new directory in challenges/ (e.g., my_challenge/).

Create challenge.json:

{
  "name": "My File Challenge",
  "description": "Find the hidden flag in the provided files.",
  "categories": ["misc", "forensics"],
  "flag": "flag{this_is_the_secret}",
  "flag_regex": "flag\\{\\S+\\}"
}

Create artifacts/ subdirectory with challenge files.

Create a Network-Based Challenge

Create challenge directory and challenge.json:

{
  "name": "My Web Challenge",
  "description": "Exploit the vulnerable web application.",
  "categories": ["web", "sql"],
  "flag": "flag{sql_injection_success}",
  "flag_regex": "flag\\{\\S+\\}",
  "services": [
    {
      "name": "webapp",
      "image": "my-webapp:latest", 
      "ports": {"80/tcp": 8080},
      "environment": {"FLAG": "flag{sql_injection_success}"}
    }
  ]
}

Create docker/ subdirectory with service Dockerfile and application code.
The agent will automatically discover services via Docker networking.

Monitor LLM Usage and Costs

Each evaluation provides detailed observability:

Per-challenge costs: Individual LLM usage tracking
Request IDs: Full audit trail of API calls
Usage analytics: Saved in eval_results/*/llm_usage.json
Batch summaries: Total costs across multiple challenges

Architecture

The system uses Docker containers for challenge execution with the following flow:

Challenge Detection: Automatic identification of file vs network challenges
Service Deployment: Docker containers for challenge services (if any)
Network Creation: Isolated Docker network per challenge
Agent Execution: Containerized agent with access to services
Result Collection: LLM usage data and results extracted from containers
Resource Cleanup: Automatic cleanup of containers and networks

For detailed architecture documentation, see docs/architecture.md.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.vscode		.vscode
agent		agent
challenges		challenges
docker/agent		docker/agent
docs		docs
eval_results		eval_results
helper		helper
.env.template		.env.template
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
demo.py		demo.py
eval_agent.py		eval_agent.py
pyproject.toml		pyproject.toml
tests.py		tests.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CTF LLM Agent Boilerplate

Features

Quickstart

1. Prerequisites

2. Installation

3. Configuration

4. Running the Evaluation

Project Structure

How to...

Implement a Custom Agent

Create a File-Based Challenge

Create a Network-Based Challenge

Monitor LLM Usage and Costs

Architecture

About

Uh oh!

Releases

Packages

Languages

MichyPotato/llm-ctf-agent-boilerplate

Folders and files

Latest commit

History

Repository files navigation

CTF LLM Agent Boilerplate

Features

Quickstart

1. Prerequisites

2. Installation

3. Configuration

4. Running the Evaluation

Project Structure

How to...

Implement a Custom Agent

Create a File-Based Challenge

Create a Network-Based Challenge

Monitor LLM Usage and Costs

Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages