Skip to content

Conversation

@iambriccardo
Copy link
Contributor

@iambriccardo iambriccardo commented Sep 3, 2024

This PR improves the spooler by reducing the number of disk reads by inferring when disk should be read and when it's not needed because we know for a fact that there isn't data on disk.

This whole implementation relies on the assumption that the data in the database is modified exclusively from the EnvelopeStack with given project key pairs.

Closes: https://github.com/getsentry/team-ingest/issues/532

#skip-changelog

@iambriccardo iambriccardo marked this pull request as ready for review September 3, 2024 13:21
@iambriccardo iambriccardo requested a review from a team as a code owner September 3, 2024 13:21
// On the other hand, if we are recreating a stack, it means that we popped it because
// it was empty, or we never had data on disk for that stack, so we assume by default
// that there is no need to check disk until some data is spooled.
matches!(stack_creation_type, StackCreationType::Initialization),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we move this to a helper function assume_data_on_disk(stack_creation_type) to clarify intent? The code comment can then be a doc comment on that function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will do!

/// Pushes a new [`EnvelopeStack`] with the given [`Envelope`] inserted.
async fn push_stack(
&mut self,
stack_creation_type: StackCreationType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make StackCreationType an enum with data and remove project_key_pair and envelope parameters, something like

enum StackOrigin {
    Existing(ProjectKeyPair),
    NewEnvelope(Box<Envelope>),
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is a confusing enumerator since it doesn't depict what is going on. I would rather always create an empty stack and specify the context in which it is created.

@iambriccardo iambriccardo merged commit a766d92 into master Sep 4, 2024
@iambriccardo iambriccardo deleted the riccardo/fix/improve-disk-check branch September 4, 2024 11:46
jan-auer added a commit that referenced this pull request Sep 6, 2024
* master: (27 commits)
  build: Update dialoguer and hostname (#4009)
  build: Update opentelemetry-proto to 0.7.0 (#4000)
  build: Update lru to 0.12.4 (#4008)
  build: Update cookie to 0.18.1 (#4007)
  feat(spans): Extract standalone CLS span metrics and performance score (#3988)
  build: Update cadence to 1.4.0 and statsdproxy to 0.2.0 (#4005)
  build: Update maxminddb to 0.24.0 (#4003)
  build: Update multer to 3.1.0 (#4002)
  build: Update regex and aho-corasick (#4001)
  build: Update sentry-kafka-schemas to 1.0.107 (#3999)
  build: Update dev-dependencies (#3998)
  build: Update itertools to 0.13.0 (#3993)
  build: Update brotli, zstd, flate2 (#3996)
  build: Update rdkafka to 0.36.2 (#3995)
  build: Update tikv-jemallocator to 0.6.0 (#3994)
  build: Update minidump to 0.22.0 (#3992)
  build: Update bindgen to 0.70.1 (#3991)
  build: Update chrono to 0.4.38 (#3990)
  feat(spans): initial MongoDB description scrubbing support (#3912)
  fix(spooler): Reduce number of disk reads (#3983)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants