Skip to content

thomasnormal/dv-smith

Repository files navigation

DV-Smith: SystemVerilog/UVM Verification Gym Generator

DV-Smith is a framework that automatically converts SystemVerilog/UVM testbenches into containerized verification tasks (DV gyms), enabling AI agents and automated tools to learn and improve hardware verification.

Inspired by SWE-smith and SWE-Gym, DV-Smith brings the same containerized task paradigm to hardware verification.

🎯 What is a DV-Smith?

DV-Smith is a DV gym generator that:

  • Analyzes UVM repositories using AI to discover tests, sequences, and covergroups
  • Builds isolated verification tasks from existing testbenches
  • Evaluates solutions based on functional coverage, code coverage, and simulation health
  • Supports multiple simulators: Xcelium, Questa/ModelSim, VCS, Verilator

Key Features

Claude-Powered Analysis: Uses Claude 3.5 Sonnet to understand any UVM repository structure 🎯 Automatic Task Generation: Converts existing tests into isolated tasks with HOWTO guides 📈 Multi-Metric Evaluation: Scores solutions on coverage and health metrics 🔌 Pluggable Simulator Support: Extensible adapter system for any simulator 🧪 Comprehensive Testing: Unit tests, integration tests, and real-world benchmarks 📝 Intelligent Gym Cleaning: Uses Claude Code SDK to identify and preserve infrastructure files 🔍 AI Transparency: Complete logging of all AI calls with debugging tools (dvsmith ai-logs)

🚀 Quick Start

Prerequisites

  • Python 3.12+
  • Docker (required by Terminal-Bench)
  • Anthropic API key

Installation

git clone https://github.com/yourusername/dv-smith.git
cd dv-smith

# Install with uv (recommended)
uv sync
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Required: Set Anthropic API key for Claude-powered analysis
echo "ANTHROPIC_API_KEY=your-key-here" > .env

Create Your First Terminal-Bench Tasks

# 1. Ingest and analyze a UVM repository
dvsmith ingest https://github.com/mbits-mirafra/apb_avip

# 2. Build a specific Terminal-Bench task
dvsmith build coverage-apb_master_coverage

# 3. Explore generated task
ls dvsmith_workspace/terminal_bench_tasks/apb_avip/coverage-apb_master_coverage/
# You'll see: prompt.md, task.yaml, Dockerfile, tests/, solution.sh

# 4. Run Terminal-Bench to test an AI agent on the task
tb run -t coverage-apb_master_coverage \
  --dataset-path dvsmith_workspace/terminal_bench_tasks/apb_avip \
  -a claude-code --livestream

# 5. View results
./parse_agent_log.py runs/<run-id>/coverage-apb_master_coverage/.../sessions/agent.log

Running Tasks from thomas_tasks/

The thomas_tasks/ directory contains pre-built AXI4 verification tasks that can be run directly with Terminal-Bench:

# Run a specific AXI4 task with Claude Code agent
tb run \
  --dataset-path thomas_tasks \
  --task-id axi4_blocking_32b_write_read_test \
  --agent claude-code \
  --model anthropic/claude-sonnet-4-5 \
  --livestream

# Available tasks:
# - axi4_blocking_32b_write_read_test
# - axi4_blocking_incr_burst_read_test
# - axi4_blocking_incr_burst_write_read_test
# - axi4_blocking_wrap_burst_write_read_test

These tasks are ready-to-use with Docker environments, solution scripts, and grading infrastructure.

For complete documentation on the build command, see Build Command Documentation.

Running dvsmith build in Docker (Recommended for Security)

⚠️ Security Warning: dvsmith build runs AI agents that execute arbitrary bash commands on your system. For untrusted repositories, run it in Docker isolation.

# One-time: Build the Docker image
docker build -f Dockerfile.dvsmith -t dvsmith:latest .

# Run dvsmith build safely in Docker
docker run -it --rm \
  --network=none \
  -v $(pwd)/dvsmith_workspace:/workspace \
  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
  dvsmith:latest build coverage-apb_master_coverage

# For ingest (needs network access to clone repos)
docker run -it --rm \
  -v $(pwd)/dvsmith_workspace:/workspace \
  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
  dvsmith:latest ingest https://github.com/mbits-mirafra/apb_avip

Security benefits:

  • ✅ No network access for build command (prevents data exfiltration)
  • ✅ Isolated filesystem (only workspace directory is accessible)
  • ✅ Can't modify your host system
  • ✅ Reproducible builds

🔍 AI Transparency & Debugging

DV-Smith provides full transparency into AI operations with built-in logging and debugging tools.

Debug Logging

Enable verbose debug output to troubleshoot issues or understand what's happening:

export DVSMITH_DEBUG=1
dvsmith build apb_avip --sim xcelium

This will show:

  • Detailed compilation commands and simulator invocations
  • File operations (copying, removing, etc.)
  • AI query details and responses
  • Coverage extraction steps
  • Infrastructure file analysis

Debug output uses the standard Python logging system and is enabled only when DVSMITH_DEBUG is set to 1, true, or yes.

View AI Call Logs

All AI interactions are automatically logged to ~/.dvsmith/ai_calls.jsonl:

# View recent AI calls (last 10 by default)
dvsmith ai-logs

# Show all entries
dvsmith ai-logs --all

# Show detailed view of a specific call
dvsmith ai-logs -d 5

📚 Documentation

📊 Benchmarks

DV-Smith has been tested on public UVM AVIPs:

Benchmark Tests Found Tasks Generated Covergroups Simulators Status
APB AVIP 10 9 2 questa, vcs, xcelium
AXI4 AVIP 72 70 2 xcelium, vcs, questa
I3C AVIP 8 6 2 questa, vcs, xcelium
SPI AVIP TBD TBD TBD questa, vcs, xcelium ⚠️

🧪 Testing

For debugging, set DVSMITH_DEBUG=1

# Run all tests
pytest tests/ -v

# Run specific test suites
pytest tests/test_models.py -v                  # Unit tests
pytest tests/test_coverage_parsers.py -v        # Parser tests
pytest tests/test_integration.py -v             # Integration tests

# Run with coverage
pytest tests/ --cov=dvsmith --cov-report=html

Workspace Structure

dvsmith_workspace/
├── clones/                # Cloned repositories
│   └── <bench_name>/
├── profiles/              # Repository profiles
│   └── <bench_name>.yaml
└── gyms/                  # Generated DV gyms
    └── <bench_name>/
        ├── tasks/         # Task specifications (*.md)
        ├── HOWTO.md       # Guide for adding new tests
        ├── gym_metadata.yaml
        ├── backups/       # Original test files (for reference)
        ├── work/          # Evaluation artifacts
        │   └── eval/
        │       └── <task_id>/
        │           ├── *.log
        │           └── coverage files
        ├── src/           # Source code (tests removed)
        ├── sim/           # Simulation makefiles
        └── ...            # Other repo files

Task Format

Each task includes a "Getting Started" section that directs agents to read the HOWTO.md file:

## Getting Started
**IMPORTANT:** Before implementing your solution, read the `HOWTO.md` file in the gym root directory.
It contains critical information about:
- How to add tests to the package file (required for compilation)
- UVM test structure and base classes
- Common errors and how to fix them

The HOWTO.md guide is automatically generated for each gym and includes:

  • Step-by-step instructions for adding new UVM tests
  • Package file editing requirements (critical for test registration)
  • Common pitfalls and troubleshooting

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •