-
Notifications
You must be signed in to change notification settings - Fork 4k
kvserver: use Pebble snapshot for catchup scans #156303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arulajmani @stevendanna this the same code as in #154412, which was rolled back, post the Pebble fix to the snapshot and excise interaction. I noticed there has been some refactoring in this area since that change, so you may want to take a quick look to ensure I haven't messed with the spirit of that refactor.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @asg0451, @kev-cao, @mgartner, @RaduBerinde, and @stevendanna)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for following up here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @asg0451, @kev-cao, @mgartner, and @RaduBerinde)
pkg/kv/kvserver/rangefeed/catchup_scan.go line 258 at r1 (raw file):
// iterator, and using the cpu time to amortize that cost seems // reasonable. if (readmitted && i.iterRecreateDuration > 0) || util.RaceEnabled {
@stevendanna @wenyihu6
If we have range-key [a, d) and point keys a, b, c, it is now allowable for us to recreate the iterator at b and c. And so the first, second and third iterators will all see part of that range key. Which will result in us emitting three kvpb.RangeFeedDeleteRange events with spans [a, d), [b, d) and [c, d) respectively.
Two questions:
- Is that ok behavior from a correctness perspective?
- Is there an existing rangefeed/changefeed/... test that has a range key with multiple overlapping point keys. If yes, what test? Given we always recreate under
RaceEnabled, I want to see how that test behaves with this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @asg0451, @kev-cao, @mgartner, @RaduBerinde, and @wenyihu6)
pkg/kv/kvserver/rangefeed/catchup_scan.go line 258 at r1 (raw file):
Previously, sumeerbhola wrote…
@stevendanna @wenyihu6
If we have range-key [a, d) and point keys a, b, c, it is now allowable for us to recreate the iterator at b and c. And so the first, second and third iterators will all see part of that range key. Which will result in us emitting three kvpb.RangeFeedDeleteRange events with spans [a, d), [b, d) and [c, d) respectively.
Two questions:
- Is that ok behavior from a correctness perspective?
- Is there an existing rangefeed/changefeed/... test that has a range key with multiple overlapping point keys. If yes, what test? Given we always recreate under
RaceEnabled, I want to see how that test behaves with this change.
Hrmmmm. This is a good callout.
- I don't see why it wouldn't be OK from a correctness perspective. But my understanding is that we it would have been valid for us to see either set of range keys anyway. At lease, we've assumed this is the case in export request. I'll keep thinking about this.
- At the moment changefeeds never see range keys because the only operations that write range keys are operations that first take the relevant table offline. However, we do expect to see these in PCR. I think the closest test to what you describe is TestWithOnDeleteRange in pkg/kv/kvclient/rangefeed/rangefeed_external_test.go. We might need to modify it to have more overlaps.
464ea50 to
5168de8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @asg0451, @kev-cao, @RaduBerinde, and @wenyihu6)
pkg/kv/kvserver/rangefeed/catchup_scan.go line 258 at r1 (raw file):
Previously, stevendanna (Steven Danna) wrote…
Hrmmmm. This is a good callout.
- I don't see why it wouldn't be OK from a correctness perspective. But my understanding is that we it would have been valid for us to see either set of range keys anyway. At lease, we've assumed this is the case in export request. I'll keep thinking about this.
- At the moment changefeeds never see range keys because the only operations that write range keys are operations that first take the relevant table offline. However, we do expect to see these in PCR. I think the closest test to what you describe is TestWithOnDeleteRange in pkg/kv/kvclient/rangefeed/rangefeed_external_test.go. We might need to modify it to have more overlaps.
Thanks for the pointer to TestWithOnDeleteRange. I verified that it is behaving as expected with and without iterator recreation, and added a couple of comments.
The snapshot is used to create an iterator, which is recreated based on the storage.snapshot.recreate_iter_duration cluster setting, which defaults to 20s. This is mostly plumbing changes, except for catchup_scan.go. Fixes cockroachdb#133851 Epic: none Release note (ops change): The cluster setting storage.snapshot.recreate_iter_duration (default 20s) controls how frequently a long-lived engine iterator, backed by an engine snapshot, will be closed and recreated. Currently, it is only used for iterators used in rangefeed catchup scans.
5168de8 to
264bacc
Compare
|
TFTR! |
|
bors r=stevendanna |
The snapshot is used to create an iterator, which is recreated based on the storage.snapshot.recreate_iter_duration cluster setting, which defaults to 20s.
This is mostly plumbing changes, except for catchup_scan.go.
Fixes #133851
Epic: none
Release note (ops change): The cluster setting
storage.snapshot.recreate_iter_duration (default 20s) controls how frequently a long-lived engine iterator, backed by an engine snapshot, will be closed and recreated. Currently, it is only used for iterators used in rangefeed catchup scans.