-
Notifications
You must be signed in to change notification settings - Fork 61
[nexus] garbage collect orphaned FM sitreps #9335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
0b6ed36 to
80d8d00
Compare
80d8d00 to
e186ab4
Compare
| "failed to get fetch metadata for sitrep {} (v{}): {e}", | ||
| v.id, v.version | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be the expected case if we did a partial insert of sitrep data, but the "final sitrep" was not inserted?
Or are we saying: "if any records exist correlated with a sitrep, it is guaranteed to have an entry in the fm_sitrep table"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: Seems like you're relying on that ordering, as I'm reading below.
| let mut paginator = | ||
| Paginator::new(SQL_BATCH_SIZE, PaginationOrder::Descending); | ||
|
|
||
| while let Some(p) = paginator.next() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My read on this is:
- We only consider orphans "relative to a version"
- We only consider versions that are returned from
fm_sitrep_version_list
Does this mean that if any rows are removed from fm_sitrep_history, we won't be able to clean up the corresponding orphans which might exist for that version?
Do we have any protection against this leak? If no: are we documenting that this is a critical constraint we need to satisfy before deleting any fm_sitrep_history rows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to add: "As long as we check for no orphans before removing an old row from fm_sitrep_history, this will be safe", but I think that could still cause orphaned sitreps to be leaked if:
- We performed history cleanup for an old row, and removed all orphans
- Concurrently, a slow-old-nexus inserted a sitrep referencing that very old parent (obviously it would not be able to insert into
fm_sitrep_history)
In this case, a sitrep would be inserted, referencing an old parent, and we'd never scan the necessary version to delete such a sitrep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: If we "just don't delete versions", then the background task will linearly increase the number of queries it needs to perform forever, which would be bad.
|
|
||
| paginator = p.found_batch(&versions, &|v| SqlU32::from(v.version)); | ||
|
|
||
| for v in versions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Related to my earlier comment) I think there's some complexity in identifying "orphans by version", since it makes it unsafe to delete any part of the version history while orphans might remain.
It might be possible to do orphan deletion in a more version-independent way:
- Lookup all sitreps that are not referenced by
fm_sitrep_history - Delete all these sitreps, unless
parent_sitrep_idis the current active sitrep
Basically:
WITH current_sitrep_id AS (
SELECT sitrep_id
FROM omicron.public.fm_sitrep_history
ORDER BY version DESC
LIMIT 1
),
-- Scan through `fm_sitreps` to avoid full table scans
batch AS (
SELECT s.id, s.parent_sitrep_id
FROM omicron.public.fm_sitrep s
WHERE s.id > $1
ORDER BY s.id
LIMIT $2
)
SELECT FROM omicron.public.fm_sitrep
WHERE id IN (
SELECT b.id
FROM batch b
-- Lookup the sitrep in the history
LEFT JOIN omicron.public.fm_sitrep_history h ON h.sitrep_id = b.id
-- Find sitreps missing from history
WHERE h.sitrep_id IS NULL
-- ... where the sitrep cannot be made active
AND (b.parent_sitrep_id IS NULL
OR b.parent_sitrep_id != (SELECT sitrep_id FROM current_sitrep_id))
)
RETURNING id;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah... I had done it based on entries in the version table as a way to avoid doing a full table scan, but I think this approach would also work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so, I was originally thinking I would do this using the current sitrep ID from the sitrep_loader watch channel, rather than as a CTE, but I realize that actually doesn't work. The current sitrep in the database may have moved, and if a "stale" current sitrep ID was used for this query, it would return children of the current sitrep which are in progress. I'm not sure if the CTE is safe either: would this ensure that the current sitrep is locked while we are executing the SELECT, or could it have moved forwards in the midst of this query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I wrote this proposed CTE incorrectly, my intent was that:
- Every sitrep in history
- Every sitrep which could be added to history
Are never going to be returned from the SELECT. In fact, we'd only return the opposite: sitreps which are not in history, and which cannot be in history.
As a result: this should be safe from concurrent insertions - current / possible future sitreps shouldn't be returnable from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I wasn't sure about was whether the fm_sitrep_history table would be locked as long as the query is executing. If it is, then this should be fine. But if a new current sitrep could be added after we observe the current ID, then I think we would incorrectly observe children of the new current sitrep as eligible for GC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As currently written, this is a CTE, so it should do lock fm_sitrep_history (basically, whatever value it thinks is "latest" should remain valid for the duration of the statement).
If we run this CTE while...
- ... other Nexuses are adding new sitreps, which could become active, but are not yet: they won't be eligible for removal, because the CTE explicitly ignores them
- ... other Nexuses are concurrently adding old sitreps (using an old parent), those would be eligible for removal. But that's okay, it's sorta indistinguishable from them adding old sitreps "a long time ago".
- ... other Nexuses are adding a new row to
fm_sitrep_history, this CTE will run either before that row addition (and explicitly ignore the possible-new sitrep) or after that row addition (and it will explicitly ignore anfm_sitrepwith an entry infm_sitrep_history).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To answer your question more generally: I believe that a single-statement CTE is "implicitly a transaction" from the perspective of CockroachDB. So, from an ACID point-of-view:
WITH
FOO AS (...)
BAR AS (...)
SELECT ...
Should roughly have the same semantics as:
BEGIN
let foo = SELECT (...);
let bar = SELECT(...);
SELECT ...;
COMMIT;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, cool, thank you. I wasn't totally sure what the locking semantics were here, so thanks for explaining. In that case, this seems like the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ta-dah: 642854f (man, writing CTEs in Diesel really sucks)
When a Nexus attempts to commit a new fault management situation report to the sitrep history but fails to do so because another sitrep with the same parent has already been inserted, that sitrep is said to be orphaned. Records pertaining to it are left behind in the database, but it will not be accessed by the rest of the system. Thus, we must occasionally garbage-collect such sitreps. This branch adds a background task for doing so.
Depends on #9320