Skip to content

Commit 628a006

Browse files
committed
docs: add benchmarking guide (benchstat workflow + pprof) and performance PR checklist
1 parent 8420f3f commit 628a006

File tree

1 file changed

+157
-0
lines changed

1 file changed

+157
-0
lines changed

benchmark/README.md

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# Benchmarking Guide
2+
3+
Standard steps for running micro-benchmarks, comparing revisions, and capturing quick profiles when submitting performance-related changes.
4+
5+
## TL;DR (60 seconds)
6+
7+
```bash
8+
# Install benchstat once
9+
go install golang.org/x/perf/cmd/benchstat@latest
10+
11+
# Baseline from main
12+
git checkout main
13+
go test ./... -bench=.^ -benchmem -run=^$ -count=10 -benchtime=10s > /tmp/base.txt
14+
15+
# Candidate from your branch
16+
git checkout my-perf-branch
17+
go test ./... -bench=.^ -benchmem -run=^$ -count=10 -benchtime=10s > /tmp/cand.txt
18+
19+
# Compare
20+
benchstat /tmp/base.txt /tmp/cand.txt
21+
```
22+
23+
Paste the `benchstat` table in your PR with Go/OS/CPU details and the flags you used.
24+
25+
---
26+
27+
## Purpose & scope
28+
29+
* Use **micro-benchmarks** to validate small performance changes (allocations, hot functions, handler paths).
30+
* This guide covers **baseline vs candidate** comparisons and quick CPU/memory profiling.
31+
* For end-to-end throughput/tail latency, complement with integration or load tests as appropriate.
32+
33+
## Reproducibility checklist
34+
35+
* Run **baseline and candidate on the same machine** with minimal background load.
36+
* Pin concurrency: set `GOMAXPROCS` (often to your CPU count).
37+
* Use multiple repetitions (`-count`) and a fixed run time (`-benchtime`) to reduce variance.
38+
* Do one warmup run before collecting results.
39+
* Record **Go version, OS/CPU model**, and the exact flags you used.
40+
41+
Example env header to include in your PR:
42+
43+
```
44+
go version: go1.22.x
45+
OS/CPU: <your OS> / <CPU model>
46+
GOMAXPROCS=<n>; flags: -count=10 -benchtime=10s
47+
```
48+
49+
## Running micro-benchmarks
50+
51+
Choose either **all benchmarkable packages** or a **specific scope**.
52+
53+
* All benchmarks (repo-wide):
54+
55+
```bash
56+
go test ./... -bench=.^ -benchmem -run=^$ -count=10 -benchtime=10s
57+
```
58+
* Specific package:
59+
60+
```bash
61+
go test ./path/to/pkg -bench=.^ -benchmem -run=^$ -count=15 -benchtime=5s
62+
```
63+
* Specific benchmark (regex):
64+
65+
```bash
66+
go test ./path/to/pkg -bench='^BenchmarkUnaryEcho$' -benchmem -run=^$ -count=20 -benchtime=1s
67+
```
68+
69+
**Flag notes**
70+
71+
* `-bench=.^` runs all benchmarks in scope; narrow with a regex when needed.
72+
* `-benchmem` reports `B/op` and `allocs/op`.
73+
* `-run=^$` skips non-benchmark tests.
74+
* `-count` repeats whole runs to stabilize results (10–20 is common).
75+
* `-benchtime` sets per-benchmark run time; increase for noisy benches.
76+
77+
## Baseline vs candidate with benchstat
78+
79+
1. **Baseline (main):**
80+
81+
```bash
82+
git checkout main
83+
go test ./... -bench=.^ -benchmem -run=^$ -count=10 -benchtime=10s > /tmp/base.txt
84+
```
85+
2. **Candidate (your branch):**
86+
87+
```bash
88+
git checkout my-perf-branch
89+
go test ./... -bench=.^ -benchmem -run=^$ -count=10 -benchtime=10s > /tmp/cand.txt
90+
```
91+
3. **Compare:**
92+
93+
```bash
94+
benchstat /tmp/base.txt /tmp/cand.txt
95+
```
96+
97+
**Interpreting `benchstat`**
98+
99+
* Focus on `ns/op`, `B/op`, `allocs/op`.
100+
* Negative **delta** = improvement.
101+
* `p=` is a significance indicator (smaller is stronger).
102+
* Call out **meaningful** wins (e.g., ≥5–10%) and explain why your change helps.
103+
104+
**Sample output (illustrative)**
105+
106+
```
107+
name old ns/op new ns/op delta
108+
UnaryEcho/Small-8 12,340 11,020 -10.7% (p=0.002 n=10+10)
109+
B/op 1,456 1,290 -11.4%
110+
allocs/op 12.0 11.0 -8.3%
111+
```
112+
113+
## Quick profiling with pprof (optional)
114+
115+
When you need to see *why* a change moves performance:
116+
117+
```bash
118+
# CPU profile for one benchmark
119+
go test ./path/to/pkg -bench='^BenchmarkUnaryEcho$' -run=^$ -cpuprofile=cpu.out -benchtime=30s
120+
121+
# Memory profile (alloc space)
122+
go test ./path/to/pkg -bench='^BenchmarkUnaryEcho$' -run=^$ -memprofile=mem.out -benchtime=30s
123+
```
124+
125+
Inspect:
126+
127+
```bash
128+
go tool pprof cpu.out # commands: 'top', 'top -cum', 'web'
129+
go tool pprof mem.out
130+
```
131+
132+
Include a short note in your PR (e.g., "fewer copies on hot path; top symbol shifted from X to Y").
133+
134+
## Using helper scripts (if present)
135+
136+
If this repository provides helper scripts under `./benchmark` or `./scripts/` to run or capture benchmarks, you may use them to produce **raw outputs** for baseline and candidate with the **same flags**, then compare with `benchstat` as shown above.
137+
138+
Plain `go test -bench` commands are equally fine as long as you capture raw outputs and attach a `benchstat` diff.
139+
140+
## What to include in a performance PR
141+
142+
* A **benchstat** table comparing baseline vs candidate
143+
* **Environment header**: Go version, OS/CPU, `GOMAXPROCS`
144+
* **Flags** used: `-count`, `-benchtime`, any selectors
145+
* (Optional) **pprof** highlights (top symbols or a flamegraph)
146+
* One paragraph on *why* the change helps (evidence beats theory)
147+
148+
## Troubleshooting
149+
150+
* **High variance?** Increase `-count` or `-benchtime`, narrow the scope, and close background apps.
151+
* **Network noise?** Prefer in-memory transports for micro-benchmarks.
152+
* **Different machines?** Don’t compare across hosts; run both sides on the same box.
153+
* **Allocs improved but ns/op didn’t?** Still valuable—less GC pressure at scale.
154+
155+
---
156+
157+
Maintainers: if you prefer different default `-count` / `-benchtime`, or want a `make benchmark` target that wraps these commands, this can be added in a follow-up PR.

0 commit comments

Comments
 (0)