-
Notifications
You must be signed in to change notification settings - Fork 702
Compact symdb #2136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compact symdb #2136
Conversation
Co-authored-by: Anton Kolesnikov <[email protected]>
| func (r *stacktraceResolverV1) Load(context.Context) error { | ||
| // FIXME(kolesnikovae): Loading all stacktraces from parquet file | ||
| // into memory is likely a bad choice. Instead we could convert | ||
| // it to symdb first. | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compacting v1 stacktraces might be a bit challenging. It is easy if a source block is compacted entirely, when we read all of its profiles sequentially. However, in practice I guess we will need to filter based on time and series which changes the access pattern to random access.
@cyriltovena what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we don't compact v1.
| profile.row.ForStacktraceIDsValues(func(values []parquet.Value) { | ||
| s.loadStacktracesID(values) | ||
| r := s.rewriters[profile.blockReader] | ||
| if err = r.rewriteStacktraces(profile.row.StacktracePartitionID(), s.stacktraces); err != nil { | ||
| return | ||
| } | ||
| s.numSamples += uint64(len(values)) | ||
| for i, v := range values { | ||
| // FIXME: the original order is not preserved, which will affect encoding. | ||
| values[i] = parquet.Int64Value(int64(s.stacktraces[i])).Level(v.RepetitionLevel(), v.DefinitionLevel(), v.Column()) | ||
| } | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we translate stack trace IDs, the original order will be invalidated:
Before compaction:
IDs: 1, 2, 3, 4, 42, 43
After compaction:
IDs: 1, 2, 6, 4, 453, 12
It will deteriorate delta encoding compression ratio. We clearly should fix this
| } | ||
| var err error | ||
| profile := s.profiles.At() | ||
| profile.row.ForStacktraceIDsValues(func(values []parquet.Value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may also want to rewrite Comments field, which is basically a reference to strings
The PR contains implementation of the symbols rewriter which is used in compaction. The design of the implementation is heavily influenced by the following factors:
TODO:
Currently, the complete compaction flow is incomplete which complicates further testing. Specifically, I didn't manage to query any data from a storage with a compacted block: it either returns empty result, or fails with the error
panic: label hash conflict.Nevertheless, I think that we should pull this change into the main compaction branch and continue work there. Moreover, I'd like to get the compaction branch merged into main (
next).