Compare `*stacktraceTree.insert` against different initial sizes #4033

inkel · 2025-03-20T18:02:57Z

During the past hackathon I was looking at trying to optimize performance of different Grafana products, and I've found that the *stacktraceTree.insert method was consuming lots of CPU and memory. Everything indicated that the calls to append were the responsible ones, so I've added a new benchmark to compare how the existing benchmark for this operation behaves against different initial sizes (currently 0).

This benchmark can be slow, thus it won't run unless the COMPARE_STACKTRACETREE_INSERT_DEFAULT_SIZES environment variable is set to true. In addition, if the -v flag is passed to go test it will print the initial and max tree size after inserting all the samples from the test profile.

These are the results I've got on my local development machine:

$ env COMPARE_STACKTRACETREE_INSERT_DEFAULT_SIZES=true go test -run='^$' -bench=Benchmark_stacktrace_tree_insert_default_sizes ./pkg/phlaredb/symdb/ -v
goos: darwin
goarch: arm64
pkg: github.com/grafana/pyroscope/pkg/phlaredb/symdb
cpu: Apple M1 Pro
Benchmark_stacktrace_tree_insert_default_sizes
Benchmark_stacktrace_tree_insert_default_sizes/size=0
Benchmark_stacktrace_tree_insert_default_sizes/size=0-10                   12256             97704 ns/op          120049 B/op         14 allocs/op
    stacktrace_tree_test.go:258: init: 0 nodes ~ 0 bytes
    stacktrace_tree_test.go:266: max: 1793 nodes ~ 28688 bytes
Benchmark_stacktrace_tree_insert_default_sizes/size=10
Benchmark_stacktrace_tree_insert_default_sizes/size=10-10                  12475             95883 ns/op          102625 B/op         10 allocs/op
    stacktrace_tree_test.go:258: init: 10 nodes ~ 160 bytes
    stacktrace_tree_test.go:266: max: 1793 nodes ~ 28688 bytes
Benchmark_stacktrace_tree_insert_default_sizes/size=1024
Benchmark_stacktrace_tree_insert_default_sizes/size=1024-10                13028             91745 ns/op           81921 B/op          3 allocs/op
    stacktrace_tree_test.go:258: init: 1024 nodes ~ 16384 bytes
    stacktrace_tree_test.go:266: max: 1793 nodes ~ 28688 bytes
Benchmark_stacktrace_tree_insert_default_sizes/size=2048
Benchmark_stacktrace_tree_insert_default_sizes/size=2048-10                14047             85265 ns/op           32768 B/op          1 allocs/op
    stacktrace_tree_test.go:258: init: 2048 nodes ~ 32768 bytes
    stacktrace_tree_test.go:266: max: 1793 nodes ~ 28688 bytes
Benchmark_stacktrace_tree_insert_default_sizes/size=4096
Benchmark_stacktrace_tree_insert_default_sizes/size=4096-10                13689             89610 ns/op           65536 B/op          1 allocs/op
    stacktrace_tree_test.go:258: init: 4096 nodes ~ 65536 bytes
    stacktrace_tree_test.go:266: max: 1793 nodes ~ 28688 bytes
Benchmark_stacktrace_tree_insert_default_sizes/size=8192
Benchmark_stacktrace_tree_insert_default_sizes/size=8192-10                12843             92649 ns/op          131073 B/op          1 allocs/op
    stacktrace_tree_test.go:258: init: 8192 nodes ~ 131072 bytes
    stacktrace_tree_test.go:266: max: 1793 nodes ~ 28688 bytes
PASS
ok      github.com/grafana/pyroscope/pkg/phlaredb/symdb 13.478s

As we can observe, 2048 yields the better results when running the benchmark against the current test profile samples:

$ benchstat -col=/size -filter='/size:(0 OR 2048)' bench.stacktracetree-defaultsize.log
goos: darwin
goarch: arm64
pkg: github.com/grafana/pyroscope/pkg/phlaredb/symdb
cpu: Apple M1 Pro
                           │      0      │                2048                 │
                           │   sec/op    │   sec/op     vs base                │
_stacktrace_tree_insert-10   98.20µ ± 1%   85.14µ ± 0%  -13.29% (p=0.000 n=20)

                           │       0       │                 2048                 │
                           │     B/op      │     B/op      vs base                │
_stacktrace_tree_insert-10   117.24Ki ± 0%   32.00Ki ± 0%  -72.70% (p=0.000 n=20)

                           │      0      │                2048                │
                           │  allocs/op  │ allocs/op   vs base                │
_stacktrace_tree_insert-10   14.000 ± 0%   1.000 ± 0%  -92.86% (p=0.000 n=20)

These values are of course arbitrary and heavily dependent on the number of samples each profile has, however, it sheds some light on how by just changing the default size we can certainly improve this method without having to refactor it.

One additional idea that I didn't have the time to explore was to turn the defaultStacktraceTreeSize constant to a variable and adjust its value using heuristics so the resources consumption of creating and inserting to these stacktrace trees improves over time.

Here you can find the results: bench.stacktracetree-defaultsize.log

Refactor the benchmark for `*stacktraceTree.insert` to use the constant for default tree size instead of hard-coding 0. This will become useful in future changes.

This benchmark can be slow, thus it won't run unless the `COMPARE_STACKTRACETREE_INSERT_DEFAULT_SIZES` environment variable is set to true. In addition, if the `-v` flag is passed to `go test` it will print the initial and max tree size after inserting all the samples from the test profile.

kolesnikovae

Thanks for looking into this @inkel! I believe adding the benchmark will be helpful!

Your assertions are correct, the initial tree size is important. However, I decided to initialize it with zero size after exhaustive testing on the real data – for some time, the default size was ~10K nodes, but I noticed that it was inefficient (and other sizes as well): in practice, growing trees when they become large is preferred over preallocation on initializaiton. The reason is that there are many and many thousands of such trees and most of them won't ever reach the default size, while we spend time preallocation space.

Here you can see that even if we eliminated re-allocation of the tree altogether, the benefit would be quite moderate.

More efficient approach would be reusing the allocated trees. This, however, would require building a smart pool that maintains multiple sub-pools of various size classes (because trees vary in size drastically). I decided that the added complexity is not worth it. We could probably revisit this – in the newer version we're working on, the trees have way shorter life time, and re-allocs become more noticeable (up to 0.1% of CPU total):

But what's more important is that we could eliminate the hash map growth and rehashing, which could give us another 0.1%.

It's on my list, but there are much bigger fish to fry. If you're interested in contributing, I'd love to share some ideas with you! :)

pkg/phlaredb/symdb/stacktrace_tree_test.go

kolesnikovae · 2025-03-21T05:03:01Z

pkg/phlaredb/symdb/stacktrace_tree_test.go

@@ -220,9 +222,48 @@ func Benchmark_stacktrace_tree_insert(b *testing.B) {
 	b.ReportAllocs()

 	for i := 0; i < b.N; i++ {
-		x := newStacktraceTree(0)
+		x := newStacktraceTree(defaultStacktraceTreeSize)


Just a note: defaultStacktraceTreeSize is 0.

Yes. I just change it so if we ever adjust it, this original benchmark will be based on that value and not a hard coded 0. Would you like me to revert this change?

No I don't think so. I think Anton just wanted to point out that this line is a no-op

We could revisit if we want to always show these values, and if we want to add/remove columns.

simonswine

Thanks for sharing this ❤️

inkel added 2 commits March 20, 2025 14:33

Use default size constant for initializing tree in benchmark

09d7faa

Refactor the benchmark for `*stacktraceTree.insert` to use the constant for default tree size instead of hard-coding 0. This will become useful in future changes.

inkel requested a review from a team as a code owner March 20, 2025 18:02

kolesnikovae approved these changes Mar 21, 2025

View reviewed changes

pkg/phlaredb/symdb/stacktrace_tree_test.go Outdated Show resolved Hide resolved

pkg/phlaredb/symdb/stacktrace_tree_test.go Outdated Show resolved Hide resolved

pkg/phlaredb/symdb/stacktrace_tree_test.go Outdated Show resolved Hide resolved

kolesnikovae reviewed Mar 21, 2025

View reviewed changes

inkel added 3 commits March 21, 2025 10:09

Remove commented out line

cf94a44

Always run stacktraceTree default size benchmark

d529134

Use b.ReportMetric instead of b.Log

89c790b

We could revisit if we want to always show these values, and if we want to add/remove columns.

inkel mentioned this pull request Mar 21, 2025

Improve *deduplicatingSlice.ingest performance #4037

Merged

simonswine approved these changes Mar 21, 2025

View reviewed changes

simonswine merged commit bcb4af2 into grafana:main Mar 24, 2025
20 checks passed

inkel deleted the inkel/benchmark-stacktraces-tree-default-size-insert branch March 25, 2025 11:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compare `*stacktraceTree.insert` against different initial sizes #4033

Compare `*stacktraceTree.insert` against different initial sizes #4033

Uh oh!

inkel commented Mar 20, 2025

Uh oh!

kolesnikovae left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kolesnikovae Mar 21, 2025

Uh oh!

inkel Mar 21, 2025

Uh oh!

simonswine Mar 21, 2025

Uh oh!

simonswine left a comment

Uh oh!

Uh oh!

Uh oh!

Compare *stacktraceTree.insert against different initial sizes #4033

Compare *stacktraceTree.insert against different initial sizes #4033

Uh oh!

Conversation

inkel commented Mar 20, 2025

Uh oh!

kolesnikovae left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kolesnikovae Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

inkel Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

simonswine Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

simonswine left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Compare `*stacktraceTree.insert` against different initial sizes #4033

Compare `*stacktraceTree.insert` against different initial sizes #4033

kolesnikovae left a comment •

edited

Loading