400: updated pg compaction bounds #407

hubert2dev · 2025-11-18T14:32:57Z

Context

Task: Improve PostgreSQL bucket compaction query efficiency (ticket “400: [Postgres Storage] Inefficient index usage when compacting”).
File touched: modules/module-postgres-storage/src/storage/PostgresCompactor.ts.
Branch: feature/pg-compaction-bounds.

Current Issue

The compaction loop queries bucket_data in reverse PK order ORDER BY bucket_name DESC, op_id DESC with this filter:

bucket_name >= :bucketLower
AND (
  (bucket_name = :bucketUpper AND op_id < :upperOpId)
  OR bucket_name < :bucketUpper COLLATE "C"
)

Because of the OR, PostgreSQL cannot traverse the (group_id, bucket_name, op_id) PRIMARY KEY efficiently and instead runs Index Scan Backward using unique_id while filtering millions of rows (example plan reports ~4.3M rows scanned for a 10k LIMIT).
Result: high latency and IO for compaction jobs, especially when buckets contain large histories.

Change Overview

Row-value comparison for pagination
- Compute parameter objects (groupParam, bucketLowerParam, bucketUpperParam, upperOpParam, limitParam).
- First batch (upperOpIdLimit === BIGINT_MAX): keep binary comparison bucket_name < bucketUpper COLLATE "C".
- Subsequent batches: ROW(bucket_name COLLATE "C", op_id) < ROW(bucketUpper, upperOpIdLimit); this follows the composite index ordering exactly.
Preserve behavior
- bucketLower handling and MAX_CHAR sentinel unchanged.
- Pagination still updates bucketUpper/upperOpIdLimit from the last row in each batch.
- Decoding, MOVE/CLEAR logic, and batching limits untouched.
Type-safe params
- Explicit as const tagging on parameter objects prevents lint errors and clarifies Postgres types.

Why This Works

PostgreSQL can perform key-ordered scans when the predicate forms a prefix of the index ordering.
Replacing the OR with row comparison removes the planner barrier; the backward scan can stop as soon as the tuple falls below the (bucketUpper, upperOpIdLimit) cursor, so scanned rows ≈ requested LIMIT.
Uses binary collation (COLLATE "C") consistently to match the original comparison semantics.

Testing / Verification Plan

Explain before/after
- Capture EXPLAIN (ANALYZE, BUFFERS) for the old query to document ~4.3M scanned rows.
- Re-run after the change; expect Heap tuples/Rows removed by filter to drop near zero.
Result equivalence
- Run compactor on a staging DB and confirm bucket batches (first batch, mid-bucket, cross-bucket) match historical output.
Manual smoke
- Trigger compaction via CLI with verbose logging, verify batches proceed bucket-by-bucket without errors.

changeset-bot · 2025-11-18T14:33:01Z

⚠️ No Changeset found

Latest commit: 7b3c289

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

CLAassistant · 2025-11-18T14:33:04Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

rkistner

Can the upperOpIdLimit === BIGINT_MAX special case be removed? It seems like the second query should be able to cover that case as well. I do like the ROW (...) < ROW (...) comparison - that does appear better than the OR clause we had.
There are still two comparisons, one using the default collation, and the other using COLLATE "C". It feels like at least one of those is wrong - these should be using the same collation - which would be the better one for index usage?
The PR description mentions a test plan to confirm that (1) this fixes the performance issue, and (2) gives the same results. Have you done any of those tests?

400: updated pg compaction bounds

7b3c289

rkistner reviewed Nov 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

400: updated pg compaction bounds #407

400: updated pg compaction bounds #407

Uh oh!

hubert2dev commented Nov 18, 2025

Uh oh!

changeset-bot bot commented Nov 18, 2025

Uh oh!

CLAassistant commented Nov 18, 2025

Uh oh!

rkistner left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

400: updated pg compaction bounds #407

Are you sure you want to change the base?

400: updated pg compaction bounds #407

Uh oh!

Conversation

hubert2dev commented Nov 18, 2025

Context

Current Issue

Change Overview

Why This Works

Testing / Verification Plan

Uh oh!

changeset-bot bot commented Nov 18, 2025

⚠️ No Changeset found

Uh oh!

CLAassistant commented Nov 18, 2025

Uh oh!

rkistner left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants