Skip to content

Conversation

fenos
Copy link
Contributor

@fenos fenos commented Sep 18, 2025

Prefix GC: Statement-level triggers + per-subtree advisory locking (race-free under concurrency)

This PR replaces our prefix GC logic with a transaction-safe, concurrency-robust design based on PostgreSQL statement-level triggers and transition tables. It removes the race condition that previously left dangling prefixes during concurrent deletes/moves—without introducing any external GC tables or background jobs.


Why

Current issue: under concurrent deletes/moves, row-level triggers can evaluate leaf-ness before sibling transactions commit, clear their own work, and fail to restage parents → dangling prefixes.

Goals:

  • Deterministic cleanup under concurrency
  • No global serialization; allow parallelism across independent subtrees
  • Keep the migration small & maintainable (no queue table / workers)

What’s in this PR

New design (high level)

  • Statement-level triggers with REFERENCING OLD/NEW TABLE to batch the rows touched by a statement.
  • Per (bucket, top-level prefix) transaction-scoped advisory locks to serialize GC only where needed.
  • Bottom-up leaf deletion of ancestor prefixes for the affected paths, repeated until nothing remains.
  • On UPDATE (move/rename): create destination prefixes first, then prune the source side.
    This avoids removing shared roots prematurely.

New functions

  • storage.lock_top_prefixes(bucket_ids text[], names text[])
    Takes per-(bucket, top) advisory locks in stable order:
    pg_advisory_xact_lock(hashtextextended(bucket || '/' || top, 0))

  • storage.delete_leaf_prefixes(bucket_ids text[], names text[])
    Builds the unique ancestor set in memory and deletes leaf prefixes (no immediate child objects or subprefixes), bottom-up until stable.

  • storage.objects_delete_cleanup() (AFTER DELETE on storage.objects)
    Locks touched subtrees and prunes ancestors of deleted objects.

  • storage.objects_update_cleanup() (AFTER UPDATE on storage.objects)
    Derives NEW−OLD (destinations to ensure prefixes exist) and OLD−NEW (sources to prune).
    Locks sources first, then destinations; creates dest prefixes (idempotent) and then prunes source.

  • storage.prefixes_delete_cleanup() (AFTER DELETE on storage.prefixes)
    Prunes ancestors when prefixes are deleted directly.

All functions guard re-entrancy with the GUC storage.gc.prefixes so deletes performed by GC don’t recursively retrigger GC.

Triggers

  • objects_delete_cleanupAFTER DELETE … REFERENCING OLD TABLE AS deleted FOR EACH STATEMENT
  • objects_update_cleanupAFTER UPDATE … REFERENCING OLD/NEW TABLES … FOR EACH STATEMENT
  • prefixes_delete_cleanupAFTER DELETE on storage.prefixes … REFERENCING OLD TABLE … FOR EACH STATEMENT

Conflicting legacy triggers are dropped at the start of the migration.


Performance notes

  • Statement-level execution avoids per-row trigger thrash.
  • Advisory locks are localized to (bucket, top) so unrelated trees proceed in parallel.

Recommended indexes (unchanged schema):

Test coverage (behaviors now correct)

  • Delete last file in a folder → parent prefixes removed.
  • Deep move/rename (e.g., a/b/c/file → a/x/y/file) → old chain pruned, new chain created.
  • Move to root → source chain fully cleaned; no root prefixes created.
  • Concurrent moves/deletes from the same source folder → no dangling src/ prefixes.
  • Stress across overlapping trees → deterministic cleanup.

Operational notes

  • Under extreme contention on the same (bucket, top), GC is serialized by design, but only for that subtree; other trees proceed concurrently.
  • Everything is idempotent and safe on retries.

@fenos fenos force-pushed the fix/prefixes-concurrency-cleanup branch 3 times, most recently from b5deccb to 038dbd5 Compare September 18, 2025 11:35
@coveralls
Copy link

coveralls commented Sep 18, 2025

Pull Request Test Coverage Report for Build 17855279836

Details

  • 1 of 1 (100.0%) changed or added relevant line in 1 file are covered.
  • 8 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.02%) to 76.493%

Files with Coverage Reduction New Missed Lines %
src/storage/storage.ts 8 71.91%
Totals Coverage Status
Change from base Build 17761266519: 0.02%
Covered Lines: 21142
Relevant Lines: 27404

💛 - Coveralls

@fenos fenos force-pushed the fix/prefixes-concurrency-cleanup branch from 038dbd5 to 708394b Compare September 18, 2025 11:49
@itslenny
Copy link
Contributor

itslenny commented Sep 18, 2025

Nicely done. This is AWESOME.

I found a deadlock scenario and pushed a commit with a test that hits it 100% of the time and a small change to the migration that fixes it. You can revert the SQL change and run the tests if you want to see it in action. Without the fix the test causes a 500 with the following output:

{
  "statusCode": "500",
  "code": "DatabaseError",
  "error": "DatabaseError",
  "message": "
      update \"objects\" set \"name\" = $1, \"bucket_id\" = $2, \"metadata\" = $3, \"user_metadata\" = $4, \"version\" = $5
      where \"bucket_id\" = $6 and \"name\" = $7 returning * - deadlock detected
  "
}

@fenos fenos force-pushed the fix/prefixes-concurrency-cleanup branch from 1892554 to 5227520 Compare September 19, 2025 10:09
@fenos fenos merged commit 3d9254a into master Sep 19, 2025
2 checks passed
@fenos fenos deleted the fix/prefixes-concurrency-cleanup branch September 19, 2025 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants