-
Notifications
You must be signed in to change notification settings - Fork 672
Compare *stacktraceTree.insert
against different initial sizes
#4033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare *stacktraceTree.insert
against different initial sizes
#4033
Conversation
Refactor the benchmark for `*stacktraceTree.insert` to use the constant for default tree size instead of hard-coding 0. This will become useful in future changes.
This benchmark can be slow, thus it won't run unless the `COMPARE_STACKTRACETREE_INSERT_DEFAULT_SIZES` environment variable is set to true. In addition, if the `-v` flag is passed to `go test` it will print the initial and max tree size after inserting all the samples from the test profile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this @inkel! I believe adding the benchmark will be helpful!
Your assertions are correct, the initial tree size is important. However, I decided to initialize it with zero size after exhaustive testing on the real data – for some time, the default size was ~10K nodes, but I noticed that it was inefficient (and other sizes as well): in practice, growing trees when they become large is preferred over preallocation on initializaiton. The reason is that there are many and many thousands of such trees and most of them won't ever reach the default size, while we spend time preallocation space.

Here you can see that even if we eliminated re-allocation of the tree altogether, the benefit would be quite moderate.
More efficient approach would be reusing the allocated trees. This, however, would require building a smart pool that maintains multiple sub-pools of various size classes (because trees vary in size drastically). I decided that the added complexity is not worth it. We could probably revisit this – in the newer version we're working on, the trees have way shorter life time, and re-allocs become more noticeable (up to 0.1% of CPU total):

But what's more important is that we could eliminate the hash map growth and rehashing, which could give us another 0.1%.
It's on my list, but there are much bigger fish to fry. If you're interested in contributing, I'd love to share some ideas with you! :)
@@ -220,9 +222,48 @@ func Benchmark_stacktrace_tree_insert(b *testing.B) { | |||
b.ReportAllocs() | |||
|
|||
for i := 0; i < b.N; i++ { | |||
x := newStacktraceTree(0) | |||
x := newStacktraceTree(defaultStacktraceTreeSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: defaultStacktraceTreeSize
is 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I just change it so if we ever adjust it, this original benchmark will be based on that value and not a hard coded 0. Would you like me to revert this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I don't think so. I think Anton just wanted to point out that this line is a no-op
We could revisit if we want to always show these values, and if we want to add/remove columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing this ❤️
During the past hackathon I was looking at trying to optimize performance of different Grafana products, and I've found that the
*stacktraceTree.insert
method was consuming lots of CPU and memory. Everything indicated that the calls toappend
were the responsible ones, so I've added a new benchmark to compare how the existing benchmark for this operation behaves against different initial sizes (currently0
).This benchmark can be slow, thus it won't run unless the
COMPARE_STACKTRACETREE_INSERT_DEFAULT_SIZES
environment variable is set to true. In addition, if the-v
flag is passed togo test
it will print the initial and max tree size after inserting all the samples from the test profile.These are the results I've got on my local development machine:
As we can observe,
2048
yields the better results when running the benchmark against the current test profile samples:These values are of course arbitrary and heavily dependent on the number of samples each profile has, however, it sheds some light on how by just changing the default size we can certainly improve this method without having to refactor it.
One additional idea that I didn't have the time to explore was to turn the
defaultStacktraceTreeSize
constant to a variable and adjust its value using heuristics so the resources consumption of creating and inserting to these stacktrace trees improves over time.Here you can find the results: bench.stacktracetree-defaultsize.log