DV-Smith is a framework that automatically converts SystemVerilog/UVM testbenches into containerized verification tasks (DV gyms), enabling AI agents and automated tools to learn and improve hardware verification.
Inspired by SWE-smith and SWE-Gym, DV-Smith brings the same containerized task paradigm to hardware verification.
DV-Smith is a DV gym generator that:
- Analyzes UVM repositories using AI to discover tests, sequences, and covergroups
- Builds isolated verification tasks from existing testbenches
- Evaluates solutions based on functional coverage, code coverage, and simulation health
- Supports multiple simulators: Xcelium, Questa/ModelSim, VCS, Verilator
✨ Claude-Powered Analysis: Uses Claude 3.5 Sonnet to understand any UVM repository structure
🎯 Automatic Task Generation: Converts existing tests into isolated tasks with HOWTO guides
📈 Multi-Metric Evaluation: Scores solutions on coverage and health metrics
🔌 Pluggable Simulator Support: Extensible adapter system for any simulator
🧪 Comprehensive Testing: Unit tests, integration tests, and real-world benchmarks
📝 Intelligent Gym Cleaning: Uses Claude Code SDK to identify and preserve infrastructure files
🔍 AI Transparency: Complete logging of all AI calls with debugging tools (dvsmith ai-logs)
- Python 3.12+
- Docker (required by Terminal-Bench)
- Anthropic API key
git clone https://github.com/yourusername/dv-smith.git
cd dv-smith
# Install with uv (recommended)
uv sync
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Required: Set Anthropic API key for Claude-powered analysis
echo "ANTHROPIC_API_KEY=your-key-here" > .env# 1. Ingest and analyze a UVM repository
dvsmith ingest https://github.com/mbits-mirafra/apb_avip
# 2. Build a specific Terminal-Bench task
dvsmith build coverage-apb_master_coverage
# 3. Explore generated task
ls dvsmith_workspace/terminal_bench_tasks/apb_avip/coverage-apb_master_coverage/
# You'll see: prompt.md, task.yaml, Dockerfile, tests/, solution.sh
# 4. Run Terminal-Bench to test an AI agent on the task
tb run -t coverage-apb_master_coverage \
--dataset-path dvsmith_workspace/terminal_bench_tasks/apb_avip \
-a claude-code --livestream
# 5. View results
./parse_agent_log.py runs/<run-id>/coverage-apb_master_coverage/.../sessions/agent.logThe thomas_tasks/ directory contains pre-built AXI4 verification tasks that can be run directly with Terminal-Bench:
# Run a specific AXI4 task with Claude Code agent
tb run \
--dataset-path thomas_tasks \
--task-id axi4_blocking_32b_write_read_test \
--agent claude-code \
--model anthropic/claude-sonnet-4-5 \
--livestream
# Available tasks:
# - axi4_blocking_32b_write_read_test
# - axi4_blocking_incr_burst_read_test
# - axi4_blocking_incr_burst_write_read_test
# - axi4_blocking_wrap_burst_write_read_testThese tasks are ready-to-use with Docker environments, solution scripts, and grading infrastructure.
For complete documentation on the build command, see Build Command Documentation.
dvsmith build runs AI agents that execute arbitrary bash commands on your system. For untrusted repositories, run it in Docker isolation.
# One-time: Build the Docker image
docker build -f Dockerfile.dvsmith -t dvsmith:latest .
# Run dvsmith build safely in Docker
docker run -it --rm \
--network=none \
-v $(pwd)/dvsmith_workspace:/workspace \
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
dvsmith:latest build coverage-apb_master_coverage
# For ingest (needs network access to clone repos)
docker run -it --rm \
-v $(pwd)/dvsmith_workspace:/workspace \
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
dvsmith:latest ingest https://github.com/mbits-mirafra/apb_avipSecurity benefits:
- ✅ No network access for
buildcommand (prevents data exfiltration) - ✅ Isolated filesystem (only workspace directory is accessible)
- ✅ Can't modify your host system
- ✅ Reproducible builds
DV-Smith provides full transparency into AI operations with built-in logging and debugging tools.
Enable verbose debug output to troubleshoot issues or understand what's happening:
export DVSMITH_DEBUG=1
dvsmith build apb_avip --sim xceliumThis will show:
- Detailed compilation commands and simulator invocations
- File operations (copying, removing, etc.)
- AI query details and responses
- Coverage extraction steps
- Infrastructure file analysis
Debug output uses the standard Python logging system and is enabled only when DVSMITH_DEBUG is set to 1, true, or yes.
All AI interactions are automatically logged to ~/.dvsmith/ai_calls.jsonl:
# View recent AI calls (last 10 by default)
dvsmith ai-logs
# Show all entries
dvsmith ai-logs --all
# Show detailed view of a specific call
dvsmith ai-logs -d 5- Build Command: Complete guide to the build command and AI agent integration
- Getting Started: Installation, first gym, basic workflows
- Writing Agents: Create agents that solve verification tasks
- Understanding Evaluation: How solutions are scored
- Claude Code SDK: Using Claude Agent SDK for AI-powered analysis
DV-Smith has been tested on public UVM AVIPs:
| Benchmark | Tests Found | Tasks Generated | Covergroups | Simulators | Status |
|---|---|---|---|---|---|
| APB AVIP | 10 | 9 | 2 | questa, vcs, xcelium | ✅ |
| AXI4 AVIP | 72 | 70 | 2 | xcelium, vcs, questa | ✅ |
| I3C AVIP | 8 | 6 | 2 | questa, vcs, xcelium | ✅ |
| SPI AVIP | TBD | TBD | TBD | questa, vcs, xcelium |
For debugging, set DVSMITH_DEBUG=1
# Run all tests
pytest tests/ -v
# Run specific test suites
pytest tests/test_models.py -v # Unit tests
pytest tests/test_coverage_parsers.py -v # Parser tests
pytest tests/test_integration.py -v # Integration tests
# Run with coverage
pytest tests/ --cov=dvsmith --cov-report=htmldvsmith_workspace/
├── clones/ # Cloned repositories
│ └── <bench_name>/
├── profiles/ # Repository profiles
│ └── <bench_name>.yaml
└── gyms/ # Generated DV gyms
└── <bench_name>/
├── tasks/ # Task specifications (*.md)
├── HOWTO.md # Guide for adding new tests
├── gym_metadata.yaml
├── backups/ # Original test files (for reference)
├── work/ # Evaluation artifacts
│ └── eval/
│ └── <task_id>/
│ ├── *.log
│ └── coverage files
├── src/ # Source code (tests removed)
├── sim/ # Simulation makefiles
└── ... # Other repo files
Each task includes a "Getting Started" section that directs agents to read the HOWTO.md file:
## Getting Started
**IMPORTANT:** Before implementing your solution, read the `HOWTO.md` file in the gym root directory.
It contains critical information about:
- How to add tests to the package file (required for compilation)
- UVM test structure and base classes
- Common errors and how to fix themThe HOWTO.md guide is automatically generated for each gym and includes:
- Step-by-step instructions for adding new UVM tests
- Package file editing requirements (critical for test registration)
- Common pitfalls and troubleshooting