-
Notifications
You must be signed in to change notification settings - Fork 4k
kvserver: use Pebble snapshot for catchup scans #154412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @jeffswenson, @log-head, @rytaft, @stevendanna, and @xinhaoz)
pkg/kv/kvserver/rangefeed/catchup_scan.go line 251 at r1 (raw file):
// iterator, and using the cpu time to amortize that cost seems // reasonable. if (readmitted && i.iterRecreateDuration > 0) || util.RaceEnabled {
Always recreating, now gated behind util.RaceEnabled, ironed out some correctness issues via existing tests.
There isn't a unit test to check that we are recreating at the appropriate interval, and it seems hard to test without injecting testing alternatives to the Pacer, time functions etc.
stevendanna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevendanna reviewed 8 of 26 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @jeffswenson, @log-head, and @xinhaoz)
pkg/kv/kvserver/rangefeed/catchup_scan.go line 272 at r1 (raw file):
// on a key > lastKey. Which is why we need to potentially replace // lastKey below and to consider the case theat the iterator is // exhausted.
Just to confirm, this is OK even with withDiff because of the sameKey check above, so if the NextIgnoringTime landed us on a key outside the time bound but which is needed to populate the diff, we won't end up in this block. Does that seem right to you?
4fff1aa to
0dfc257
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @jeffswenson, @log-head, @stevendanna, and @xinhaoz)
pkg/kv/kvserver/rangefeed/catchup_scan.go line 272 at r1 (raw file):
Previously, stevendanna (Steven Danna) wrote…
Just to confirm, this is OK even with withDiff because of the
sameKeycheck above, so if the NextIgnoringTime landed us on a key outside the time bound but which is needed to populate the diff, we won't end up in this block. Does that seem right to you?
Correct. I've added a comment.
06fc1cb to
f7c28ac
Compare
arulajmani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arulajmani reviewed 18 of 26 files at r1, 8 of 8 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @jeffswenson, @log-head, @stevendanna, and @xinhaoz)
pkg/kv/kvserver/rangefeed/catchup_scan.go line 278 at r2 (raw file):
// So the following seek (which does respect the time window) may land // on a key > lastKey. Which is why we need to potentially replace // lastKey below and to consider the case theat the iterator is
nit: "that"
pkg/kv/kvserver/rangefeed/buffered_registration.go line 59 at r2 (raw file):
disconnected bool // catchUpSnap is created by replica under raftMu lock when registration is
nit: while we're here, could you add some "a" and "the"'s in this comment? e.g. "is created by a replica", "a registration", "the output loop" etc.
jbowens
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @jeffswenson, @log-head, @stevendanna, @sumeerbhola, and @xinhaoz)
-- commits line 6 at r2:
Are we worried about the snapshots continuing to pin sstables for the duration of the catchup scan? I'm wondering if our removal of old-style LSM snapshots was premature, or if catchup scans could possibly acquire a new LSM version.
|
[nit] Can you add to this comment explaining why we want to recreate the iterator and what is the trade-off for this setting? |
The snapshot is used to create an iterator, which is recreated based on the storage.snapshot.recreate_iter_duration cluster setting, which defaults to 20s. This is mostly plumbing changes, except for catchup_scan.go. Fixes cockroachdb#133851 Epic: none Release note (ops change): The cluster setting storage.snapshot.recreate_iter_duration (default 20s) controls how frequently a long-lived engine iterator, backed by an engine snapshot, will be closed and recreated. Currently, it is only used for iterators used in rangefeed catchup scans.
f7c28ac to
6e59022
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @jeffswenson, @log-head, @stevendanna, and @xinhaoz)
Are we worried about the snapshots continuing to pin sstables for the duration of the catchup scan?
Slightly.
I'm wondering if our removal of old-style LSM snapshots was premature
I think not. This is an example where we avoided using old-style snapshots because of the concern of their expense in compactions and write-amp, and so we stuck with an iterator. Replacing an iterator with a new-style snapshot has no such tradeoff, so I think is strictly better.
or if catchup scans could possibly acquire a new LSM version.
I'd asked about this on that doc, wrt whether we hold a protected timestamp. @stevendanna
pkg/kv/kvserver/rangefeed/buffered_registration.go line 59 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: while we're here, could you add some "a" and "the"'s in this comment? e.g. "is created by a replica", "a registration", "the output loop" etc.
Done. Also in unbuffered_registration.go.
pkg/kv/kvserver/rangefeed/catchup_scan.go line 278 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: "that"
Done
pkg/storage/pebble.go line 181 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] Can you add to this comment explaining why we want to recreate the iterator and what is the trade-off for this setting?
Done
|
bors r+ |
|
bors retry |
The snapshot is used to create an iterator, which is recreated based on the storage.snapshot.recreate_iter_duration cluster setting, which defaults to 20s.
This is mostly plumbing changes, except for catchup_scan.go.
Fixes #133851
Epic: none
Release note (ops change): The cluster setting
storage.snapshot.recreate_iter_duration (default 20s) controls how frequently a long-lived engine iterator, backed by an engine snapshot, will be closed and recreated. Currently, it is only used for iterators used in rangefeed catchup scans.