Skip to content

Conversation

@kolesnikovae
Copy link
Collaborator

@kolesnikovae kolesnikovae commented Jul 20, 2023

The PR contains implementation of the symbols rewriter which is used in compaction. The design of the implementation is heavily influenced by the following factors:

  • On compaction, we may want to get only a subrange of profiles from a source block. The range may be limited to a specific series set, time range, or both.
  • Symbolic information of a source block must be loaded into memory entirely, because there is no efficient way to load it in parts.

TODO:

  • Validity verification.
  • Performance tests.

Currently, the complete compaction flow is incomplete which complicates further testing. Specifically, I didn't manage to query any data from a storage with a compacted block: it either returns empty result, or fails with the error panic: label hash conflict.

Nevertheless, I think that we should pull this change into the main compaction branch and continue work there. Moreover, I'd like to get the compaction branch merged into main (next).

@kolesnikovae kolesnikovae marked this pull request as draft July 20, 2023 04:52
@kolesnikovae kolesnikovae requested a review from cyriltovena July 24, 2023 06:49
@kolesnikovae kolesnikovae marked this pull request as ready for review July 24, 2023 06:49
@kolesnikovae kolesnikovae changed the title WIP: Compact symdb Compact symdb Jul 24, 2023
@kolesnikovae kolesnikovae added enhancement New feature or request storage Low level storage matters labels Jul 24, 2023
Comment on lines +350 to +355
func (r *stacktraceResolverV1) Load(context.Context) error {
// FIXME(kolesnikovae): Loading all stacktraces from parquet file
// into memory is likely a bad choice. Instead we could convert
// it to symdb first.
return nil
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compacting v1 stacktraces might be a bit challenging. It is easy if a source block is compacted entirely, when we read all of its profiles sequentially. However, in practice I guess we will need to filter based on time and series which changes the access pattern to random access.

@cyriltovena what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we don't compact v1.

Comment on lines +516 to +527
profile.row.ForStacktraceIDsValues(func(values []parquet.Value) {
s.loadStacktracesID(values)
r := s.rewriters[profile.blockReader]
if err = r.rewriteStacktraces(profile.row.StacktracePartitionID(), s.stacktraces); err != nil {
return
}
s.numSamples += uint64(len(values))
for i, v := range values {
// FIXME: the original order is not preserved, which will affect encoding.
values[i] = parquet.Int64Value(int64(s.stacktraces[i])).Level(v.RepetitionLevel(), v.DefinitionLevel(), v.Column())
}
})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we translate stack trace IDs, the original order will be invalidated:

Before compaction:

IDs: 1, 2, 3, 4, 42, 43

After compaction:

IDs: 1, 2, 6, 4, 453, 12

It will deteriorate delta encoding compression ratio. We clearly should fix this

}
var err error
profile := s.profiles.At()
profile.row.ForStacktraceIDsValues(func(values []parquet.Value) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also want to rewrite Comments field, which is basically a reference to strings

@kolesnikovae kolesnikovae mentioned this pull request Jul 26, 2023
@cyriltovena cyriltovena merged commit 233c723 into feat/compact Jul 26, 2023
@cyriltovena cyriltovena deleted the feat/compact-symdb branch July 26, 2023 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request storage Low level storage matters

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants