Skip to content

Compare *stacktraceTree.insert against different initial sizes #4033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

inkel
Copy link
Contributor

@inkel inkel commented Mar 20, 2025

During the past hackathon I was looking at trying to optimize performance of different Grafana products, and I've found that the *stacktraceTree.insert method was consuming lots of CPU and memory. Everything indicated that the calls to append were the responsible ones, so I've added a new benchmark to compare how the existing benchmark for this operation behaves against different initial sizes (currently 0).

This benchmark can be slow, thus it won't run unless the COMPARE_STACKTRACETREE_INSERT_DEFAULT_SIZES environment variable is set to true. In addition, if the -v flag is passed to go test it will print the initial and max tree size after inserting all the samples from the test profile.

These are the results I've got on my local development machine:

$ env COMPARE_STACKTRACETREE_INSERT_DEFAULT_SIZES=true go test -run='^$' -bench=Benchmark_stacktrace_tree_insert_default_sizes ./pkg/phlaredb/symdb/ -v
goos: darwin
goarch: arm64
pkg: github.com/grafana/pyroscope/pkg/phlaredb/symdb
cpu: Apple M1 Pro
Benchmark_stacktrace_tree_insert_default_sizes
Benchmark_stacktrace_tree_insert_default_sizes/size=0
Benchmark_stacktrace_tree_insert_default_sizes/size=0-10                   12256             97704 ns/op          120049 B/op         14 allocs/op
    stacktrace_tree_test.go:258: init: 0 nodes ~ 0 bytes
    stacktrace_tree_test.go:266: max: 1793 nodes ~ 28688 bytes
Benchmark_stacktrace_tree_insert_default_sizes/size=10
Benchmark_stacktrace_tree_insert_default_sizes/size=10-10                  12475             95883 ns/op          102625 B/op         10 allocs/op
    stacktrace_tree_test.go:258: init: 10 nodes ~ 160 bytes
    stacktrace_tree_test.go:266: max: 1793 nodes ~ 28688 bytes
Benchmark_stacktrace_tree_insert_default_sizes/size=1024
Benchmark_stacktrace_tree_insert_default_sizes/size=1024-10                13028             91745 ns/op           81921 B/op          3 allocs/op
    stacktrace_tree_test.go:258: init: 1024 nodes ~ 16384 bytes
    stacktrace_tree_test.go:266: max: 1793 nodes ~ 28688 bytes
Benchmark_stacktrace_tree_insert_default_sizes/size=2048
Benchmark_stacktrace_tree_insert_default_sizes/size=2048-10                14047             85265 ns/op           32768 B/op          1 allocs/op
    stacktrace_tree_test.go:258: init: 2048 nodes ~ 32768 bytes
    stacktrace_tree_test.go:266: max: 1793 nodes ~ 28688 bytes
Benchmark_stacktrace_tree_insert_default_sizes/size=4096
Benchmark_stacktrace_tree_insert_default_sizes/size=4096-10                13689             89610 ns/op           65536 B/op          1 allocs/op
    stacktrace_tree_test.go:258: init: 4096 nodes ~ 65536 bytes
    stacktrace_tree_test.go:266: max: 1793 nodes ~ 28688 bytes
Benchmark_stacktrace_tree_insert_default_sizes/size=8192
Benchmark_stacktrace_tree_insert_default_sizes/size=8192-10                12843             92649 ns/op          131073 B/op          1 allocs/op
    stacktrace_tree_test.go:258: init: 8192 nodes ~ 131072 bytes
    stacktrace_tree_test.go:266: max: 1793 nodes ~ 28688 bytes
PASS
ok      github.com/grafana/pyroscope/pkg/phlaredb/symdb 13.478s

As we can observe, 2048 yields the better results when running the benchmark against the current test profile samples:

$ benchstat -col=/size -filter='/size:(0 OR 2048)' bench.stacktracetree-defaultsize.log
goos: darwin
goarch: arm64
pkg: github.com/grafana/pyroscope/pkg/phlaredb/symdb
cpu: Apple M1 Pro
                           │      0      │                2048                 │
                           │   sec/op    │   sec/op     vs base                │
_stacktrace_tree_insert-10   98.20µ ± 1%   85.14µ ± 0%  -13.29% (p=0.000 n=20)

                           │       0       │                 2048                 │
                           │     B/op      │     B/op      vs base                │
_stacktrace_tree_insert-10   117.24Ki ± 0%   32.00Ki ± 0%  -72.70% (p=0.000 n=20)

                           │      0      │                2048                │
                           │  allocs/op  │ allocs/op   vs base                │
_stacktrace_tree_insert-10   14.000 ± 0%   1.000 ± 0%  -92.86% (p=0.000 n=20)

These values are of course arbitrary and heavily dependent on the number of samples each profile has, however, it sheds some light on how by just changing the default size we can certainly improve this method without having to refactor it.

One additional idea that I didn't have the time to explore was to turn the defaultStacktraceTreeSize constant to a variable and adjust its value using heuristics so the resources consumption of creating and inserting to these stacktrace trees improves over time.

Here you can find the results: bench.stacktracetree-defaultsize.log

inkel added 2 commits March 20, 2025 14:33
Refactor the benchmark for `*stacktraceTree.insert` to use the
constant for default tree size instead of hard-coding 0. This will
become useful in future changes.
This benchmark can be slow, thus it won't run unless the
`COMPARE_STACKTRACETREE_INSERT_DEFAULT_SIZES` environment variable is
set to true. In addition, if the `-v` flag is passed to `go test` it
will print the initial and max tree size after inserting all the
samples from the test profile.
@inkel inkel requested a review from a team as a code owner March 20, 2025 18:02
Copy link
Collaborator

@kolesnikovae kolesnikovae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this @inkel! I believe adding the benchmark will be helpful!

Your assertions are correct, the initial tree size is important. However, I decided to initialize it with zero size after exhaustive testing on the real data – for some time, the default size was ~10K nodes, but I noticed that it was inefficient (and other sizes as well): in practice, growing trees when they become large is preferred over preallocation on initializaiton. The reason is that there are many and many thousands of such trees and most of them won't ever reach the default size, while we spend time preallocation space.

image

Here you can see that even if we eliminated re-allocation of the tree altogether, the benefit would be quite moderate.

More efficient approach would be reusing the allocated trees. This, however, would require building a smart pool that maintains multiple sub-pools of various size classes (because trees vary in size drastically). I decided that the added complexity is not worth it. We could probably revisit this – in the newer version we're working on, the trees have way shorter life time, and re-allocs become more noticeable (up to 0.1% of CPU total):

image

But what's more important is that we could eliminate the hash map growth and rehashing, which could give us another 0.1%.


It's on my list, but there are much bigger fish to fry. If you're interested in contributing, I'd love to share some ideas with you! :)

@@ -220,9 +222,48 @@ func Benchmark_stacktrace_tree_insert(b *testing.B) {
b.ReportAllocs()

for i := 0; i < b.N; i++ {
x := newStacktraceTree(0)
x := newStacktraceTree(defaultStacktraceTreeSize)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: defaultStacktraceTreeSize is 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I just change it so if we ever adjust it, this original benchmark will be based on that value and not a hard coded 0. Would you like me to revert this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I don't think so. I think Anton just wanted to point out that this line is a no-op

inkel added 3 commits March 21, 2025 10:09
We could revisit if we want to always show these values, and if we
want to add/remove columns.
Copy link
Contributor

@simonswine simonswine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing this ❤️

@simonswine simonswine merged commit bcb4af2 into grafana:main Mar 24, 2025
20 checks passed
@inkel inkel deleted the inkel/benchmark-stacktraces-tree-default-size-insert branch March 25, 2025 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants