Compact symdb #2136

kolesnikovae · 2023-07-20T04:52:24Z

The PR contains implementation of the symbols rewriter which is used in compaction. The design of the implementation is heavily influenced by the following factors:

On compaction, we may want to get only a subrange of profiles from a source block. The range may be limited to a specific series set, time range, or both.
Symbolic information of a source block must be loaded into memory entirely, because there is no efficient way to load it in parts.

TODO:

Validity verification.
Performance tests.

Currently, the complete compaction flow is incomplete which complicates further testing. Specifically, I didn't manage to query any data from a storage with a compacted block: it either returns empty result, or fails with the error panic: label hash conflict.

Nevertheless, I think that we should pull this change into the main compaction branch and continue work there. Moreover, I'd like to get the compaction branch merged into main (next).

Co-authored-by: Anton Kolesnikov <[email protected]>

kolesnikovae · 2023-07-24T07:12:45Z

pkg/phlaredb/block_querier.go

+func (r *stacktraceResolverV1) Load(context.Context) error {
+	// FIXME(kolesnikovae): Loading all stacktraces from parquet file
+	//  into memory is likely a bad choice. Instead we could convert
+	//  it to symdb first.
+	return nil
+}


Compacting v1 stacktraces might be a bit challenging. It is easy if a source block is compacted entirely, when we read all of its profiles sequentially. However, in practice I guess we will need to filter based on time and series which changes the access pattern to random access.

@cyriltovena what do you think?

I suggest we don't compact v1.

kolesnikovae · 2023-07-24T07:25:00Z

pkg/phlaredb/compact.go

+	profile.row.ForStacktraceIDsValues(func(values []parquet.Value) {
+		s.loadStacktracesID(values)
+		r := s.rewriters[profile.blockReader]
+		if err = r.rewriteStacktraces(profile.row.StacktracePartitionID(), s.stacktraces); err != nil {
+			return
+		}
+		s.numSamples += uint64(len(values))
+		for i, v := range values {
+			// FIXME: the original order is not preserved, which will affect encoding.
+			values[i] = parquet.Int64Value(int64(s.stacktraces[i])).Level(v.RepetitionLevel(), v.DefinitionLevel(), v.Column())
+		}
+	})


When we translate stack trace IDs, the original order will be invalidated:

Before compaction:

IDs: 1, 2, 3, 4, 42, 43

After compaction:

IDs: 1, 2, 6, 4, 453, 12

It will deteriorate delta encoding compression ratio. We clearly should fix this

kolesnikovae · 2023-07-24T07:28:32Z

pkg/phlaredb/compact.go

+	}
+	var err error
+	profile := s.profiles.At()
+	profile.row.ForStacktraceIDsValues(func(values []parquet.Value) {


We may also want to rewrite Comments field, which is basically a reference to strings

kolesnikovae added 8 commits July 20, 2023 12:23

Add stacktrace rewriter

b4b4e86

Fixes

21a11b1

Add lookup table test

e7d4145

Symbols reader integration

8c74e3a

Add SymbolsResolver.WriteStats

129ffec

Fix lookup table

b8f5509

Add symbols writer

9bf7151

Load symdb block files at compaction

4c3841f

kolesnikovae marked this pull request as draft July 20, 2023 04:52

cyriltovena and others added 11 commits July 23, 2023 20:26

Update pkg/iter/tree.go

030e8f0

Co-authored-by: Anton Kolesnikov <[email protected]>

Fix meta samples stats

c05dd18

Add dedup slice append

4926461

Cleanup

a299f91

Convert locations to stacktrace

ddcb3c8

Implement symdb Reader.Load

bd4ae92

Fix stacktrace inserter

8b33efc

Fix symdb meta

60ba14d

Fix lint issues

e4290de

Fix symbols rewriter integration

5a04b99

Remove unused rowNum field

756d883

kolesnikovae requested a review from cyriltovena July 24, 2023 06:49

kolesnikovae marked this pull request as ready for review July 24, 2023 06:49

kolesnikovae changed the title ~~WIP: Compact symdb~~ Compact symdb Jul 24, 2023

kolesnikovae added enhancement New feature or request storage Low level storage matters labels Jul 24, 2023

kolesnikovae commented Jul 24, 2023

View reviewed changes

kolesnikovae mentioned this pull request Jul 26, 2023

Compact Blocks #2134

Merged

cyriltovena merged commit 233c723 into feat/compact Jul 26, 2023

cyriltovena deleted the feat/compact-symdb branch July 26, 2023 06:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compact symdb #2136

Compact symdb #2136

Uh oh!

kolesnikovae commented Jul 20, 2023 •

edited

Loading

Uh oh!

kolesnikovae Jul 24, 2023

Uh oh!

cyriltovena Jul 24, 2023

Uh oh!

kolesnikovae Jul 24, 2023

Uh oh!

kolesnikovae Jul 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Compact symdb #2136

Compact symdb #2136

Uh oh!

Conversation

kolesnikovae commented Jul 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kolesnikovae Jul 24, 2023

Choose a reason for hiding this comment

Uh oh!

cyriltovena Jul 24, 2023

Choose a reason for hiding this comment

Uh oh!

kolesnikovae Jul 24, 2023

Choose a reason for hiding this comment

Uh oh!

kolesnikovae Jul 24, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kolesnikovae commented Jul 20, 2023 •

edited

Loading