Skip to content

DNM: analyze immediate gc lease deadline impact #12411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

problame
Copy link
Contributor

@problame problame commented Jul 1, 2025

Copy link

github-actions bot commented Jul 1, 2025

8580 tests run: 7585 passed, 383 failed, 612 skipped (full report)


Failures on Postgres 17

Failures on Postgres 16

Failures on Postgres 15

Failures on Postgres 14

# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_branch_and_gc[release-pg14] or test_branch_and_gc[release-pg14] or test_hot_standby_gc[release-pg14-True] or test_hot_standby_gc[release-pg14-True] or test_hot_standby_gc[release-pg14-False] or test_hot_standby_gc[release-pg14-False] or test_import_from_pageserver_small[release-pg14] or test_import_from_pageserver_small[release-pg14] or test_gc_aggressive[release-pg14] or test_gc_aggressive[release-pg14] or test_gc_index_upload[release-pg14] or test_gc_index_upload[release-pg14] or test_ondemand_download_pg_xact[release-pg14-4] or test_ondemand_download_pg_xact[release-pg14-4] or test_ondemand_download_pg_xact[release-pg14-None] or test_ondemand_download_pg_xact[release-pg14-None] or test_old_request_lsn[release-pg14] or test_old_request_lsn[release-pg14] or test_metric_collection[release-pg14] or test_metric_collection[release-pg14] or test_sql_regress[release-pg14-v1-None] or test_sql_regress[release-pg14-v1-None] or test_sql_regress[release-pg14-v2-4] or test_sql_regress[release-pg14-v2-4] or test_sql_regress[release-pg14-v2-None] or test_sql_regress[release-pg14-v2-None] or test_isolation[release-pg14-v2-4] or test_isolation[release-pg14-v2-4] or test_pg_regress[release-pg14-v2-None] or test_pg_regress[release-pg14-v2-None] or test_pg_regress[release-pg14-v1-None] or test_pg_regress[release-pg14-v1-None] or test_pg_regress[release-pg14-v1-4] or test_pg_regress[release-pg14-v1-4] or test_pg_regress[release-pg14-v2-4] or test_pg_regress[release-pg14-v2-4] or test_isolation[release-pg14-v1-4] or test_isolation[release-pg14-v1-4] or test_isolation[release-pg14-v2-None] or test_isolation[release-pg14-v2-None] or test_isolation[release-pg14-v1-None] or test_isolation[release-pg14-v1-None] or test_sql_regress[release-pg14-v1-4] or test_sql_regress[release-pg14-v1-4] or test_pitr_gc[release-pg14] or test_pitr_gc[release-pg14] or test_readonly_node_gc[release-pg14] or test_readonly_node_gc[release-pg14] or test_sharding_split_smoke[release-pg14] or test_sharding_split_smoke[release-pg14] or test_timeline_physical_size_post_gc[release-pg14] or test_timeline_physical_size_post_gc[release-pg14] or test_vm_bit_clear_on_heap_lock_blackbox[release-pg14] or test_vm_bit_clear_on_heap_lock_blackbox[release-pg14] or test_explicit_timeline_creation[release-pg14] or test_s3_eviction[release-pg14-0.2-True] or test_compute_restarts[release-pg14] or test_concurrent_computes[release-pg14] or test_unavailability[release-pg14] or test_wal_truncation[release-pg14-2] or test_wal_truncation[release-pg14-3] or test_quorum_sanity[release-pg14] or test_race_conditions[release-pg14] or test_wal_lagging[release-pg14] or test_pageserver_lsn_wait_error_safekeeper_stop[release-pg14] or test_wal_restore_initdb[release-pg14] or test_wal_restore_http[release-pg14-True] or test_walredo_not_left_behind_on_detach[release-pg14] or test_branch_and_gc[release-pg15] or test_branch_and_gc[release-pg15] or test_hot_standby_gc[release-pg15-True] or test_hot_standby_gc[release-pg15-True] or test_hot_standby_gc[release-pg15-False] or test_hot_standby_gc[release-pg15-False] or test_import_from_pageserver_small[release-pg15] or test_import_from_pageserver_small[release-pg15] or test_gc_index_upload[release-pg15] or test_gc_index_upload[release-pg15] or test_gc_aggressive[release-pg15] or test_gc_aggressive[release-pg15] or test_ondemand_download_pg_xact[release-pg15-4] or test_ondemand_download_pg_xact[release-pg15-4] or test_ondemand_download_pg_xact[release-pg15-None] or test_ondemand_download_pg_xact[release-pg15-None] or test_old_request_lsn[release-pg15] or test_old_request_lsn[release-pg15] or test_metric_collection[release-pg15] or test_metric_collection[release-pg15] or test_sql_regress[release-pg15-v2-4] or test_sql_regress[release-pg15-v2-4] or test_sql_regress[release-pg15-v1-4] or test_sql_regress[release-pg15-v1-4] or test_isolation[release-pg15-v1-None] or test_isolation[release-pg15-v1-None] or test_sql_regress[release-pg15-v2-None] or test_sql_regress[release-pg15-v2-None] or test_isolation[release-pg15-v2-None] or test_isolation[release-pg15-v2-None] or test_pg_regress[release-pg15-v1-None] or test_pg_regress[release-pg15-v1-None] or test_pg_regress[release-pg15-v1-4] or test_pg_regress[release-pg15-v1-4] or test_sql_regress[release-pg15-v1-None] or test_sql_regress[release-pg15-v1-None] or test_isolation[release-pg15-v2-4] or test_isolation[release-pg15-v2-4] or test_isolation[release-pg15-v1-4] or test_isolation[release-pg15-v1-4] or test_pg_regress[release-pg15-v2-4] or test_pg_regress[release-pg15-v2-4] or test_pg_regress[release-pg15-v2-None] or test_pg_regress[release-pg15-v2-None] or test_pitr_gc[release-pg15] or test_pitr_gc[release-pg15] or test_readonly_node_gc[release-pg15] or test_readonly_node_gc[release-pg15] or test_sharding_split_smoke[release-pg15] or test_sharding_split_smoke[release-pg15] or test_timeline_physical_size_post_gc[release-pg15] or test_timeline_physical_size_post_gc[release-pg15] or test_vm_bit_clear_on_heap_lock_blackbox[release-pg15] or test_vm_bit_clear_on_heap_lock_blackbox[release-pg15] or test_branch_and_gc[release-pg16] or test_branch_and_gc[release-pg16] or test_hot_standby_gc[release-pg16-True] or test_hot_standby_gc[release-pg16-True] or test_hot_standby_gc[release-pg16-False] or test_hot_standby_gc[release-pg16-False] or test_import_from_pageserver_small[release-pg16] or test_import_from_pageserver_small[release-pg16] or test_gc_aggressive[release-pg16] or test_gc_aggressive[release-pg16] or test_gc_index_upload[release-pg16] or test_gc_index_upload[release-pg16] or test_ondemand_download_pg_xact[release-pg16-None] or test_ondemand_download_pg_xact[release-pg16-None] or test_ondemand_download_pg_xact[release-pg16-4] or test_ondemand_download_pg_xact[release-pg16-4] or test_old_request_lsn[release-pg16] or test_old_request_lsn[release-pg16] or test_metric_collection[release-pg16] or test_metric_collection[release-pg16] or test_sql_regress[release-pg16-v2-None] or test_sql_regress[release-pg16-v2-None] or test_sql_regress[release-pg16-v1-4] or test_sql_regress[release-pg16-v1-4] or test_sql_regress[release-pg16-v2-4] or test_sql_regress[release-pg16-v2-4] or test_pg_regress[release-pg16-v1-None] or test_pg_regress[release-pg16-v1-None] or test_pg_regress[release-pg16-v1-4] or test_pg_regress[release-pg16-v1-4] or test_isolation[release-pg16-v1-4] or test_isolation[release-pg16-v1-4] or test_pg_regress[release-pg16-v2-4] or test_pg_regress[release-pg16-v2-4] or test_isolation[release-pg16-v2-4] or test_isolation[release-pg16-v2-4] or test_isolation[release-pg16-v1-None] or test_isolation[release-pg16-v1-None] or test_pg_regress[release-pg16-v2-None] or test_pg_regress[release-pg16-v2-None] or test_isolation[release-pg16-v2-None] or test_isolation[release-pg16-v2-None] or test_sql_regress[release-pg16-v1-None] or test_sql_regress[release-pg16-v1-None] or test_pitr_gc[release-pg16] or test_pitr_gc[release-pg16] or test_readonly_node_gc[release-pg16] or test_readonly_node_gc[release-pg16] or test_sharding_split_smoke[release-pg16] or test_sharding_split_smoke[release-pg16] or test_timeline_physical_size_post_gc[release-pg16] or test_timeline_physical_size_post_gc[release-pg16] or test_vm_bit_clear_on_heap_lock_blackbox[release-pg16] or test_vm_bit_clear_on_heap_lock_blackbox[release-pg16] or test_quorum_sanity[release-pg16] or test_pageserver_lsn_wait_error_start[release-pg16] or test_branch_and_gc[release-pg17] or test_branch_and_gc[release-pg17] or test_branch_and_gc[release-pg17] or test_branch_and_gc[release-pg17] or test_hot_standby_gc[release-pg17-False] or test_hot_standby_gc[release-pg17-False] or test_hot_standby_gc[debug-pg17-False] or test_hot_standby_gc[release-pg17-False] or test_hot_standby_gc[release-pg17-False] or test_hot_standby_gc[release-pg17-True] or test_hot_standby_gc[release-pg17-True] or test_hot_standby_gc[debug-pg17-True] or test_hot_standby_gc[release-pg17-True] or test_hot_standby_gc[release-pg17-True] or test_import_from_pageserver_small[release-pg17] or test_import_from_pageserver_small[release-pg17] or test_import_from_pageserver_small[debug-pg17] or test_import_from_pageserver_small[release-pg17] or test_import_from_pageserver_small[release-pg17] or test_gc_aggressive[release-pg17] or test_gc_aggressive[release-pg17] or test_gc_aggressive[debug-pg17] or test_gc_aggressive[release-pg17] or test_gc_aggressive[release-pg17] or test_gc_index_upload[release-pg17] or test_gc_index_upload[release-pg17] or test_gc_index_upload[debug-pg17] or test_gc_index_upload[release-pg17] or test_gc_index_upload[release-pg17] or test_ondemand_download_pg_xact[release-pg17-4] or test_ondemand_download_pg_xact[release-pg17-4] or test_ondemand_download_pg_xact[debug-pg17-4] or test_ondemand_download_pg_xact[release-pg17-4] or test_ondemand_download_pg_xact[release-pg17-4] or test_ondemand_download_pg_xact[release-pg17-None] or test_ondemand_download_pg_xact[release-pg17-None] or test_ondemand_download_pg_xact[debug-pg17-None] or test_ondemand_download_pg_xact[release-pg17-None] or test_ondemand_download_pg_xact[release-pg17-None] or test_old_request_lsn[release-pg17] or test_old_request_lsn[release-pg17] or test_old_request_lsn[debug-pg17] or test_old_request_lsn[release-pg17] or test_old_request_lsn[release-pg17] or test_metric_collection[release-pg17] or test_metric_collection[release-pg17] or test_metric_collection[debug-pg17] or test_metric_collection[release-pg17] or test_metric_collection[release-pg17] or test_sql_regress[release-pg17-v1-None] or test_sql_regress[release-pg17-v1-None] or test_sql_regress[release-pg17-v1-None] or test_sql_regress[release-pg17-v1-None] or test_sql_regress[debug-pg17-v1-None] or test_sql_regress[release-pg17-v2-None] or test_sql_regress[release-pg17-v2-None] or test_sql_regress[release-pg17-v2-None] or test_sql_regress[debug-pg17-v2-None] or test_sql_regress[release-pg17-v2-None] or test_isolation[release-pg17-v1-None] or test_isolation[release-pg17-v1-None] or test_isolation[debug-pg17-v1-None] or test_isolation[release-pg17-v1-None] or test_isolation[release-pg17-v1-None] or test_isolation[release-pg17-v1-4] or test_isolation[release-pg17-v1-4] or test_isolation[release-pg17-v1-4] or test_isolation[release-pg17-v1-4] or test_isolation[debug-pg17-v1-4] or test_pg_regress[release-pg17-v1-None] or test_pg_regress[release-pg17-v1-None] or test_pg_regress[debug-pg17-v1-None] or test_pg_regress[release-pg17-v1-None] or test_pg_regress[release-pg17-v1-None] or test_pg_regress[release-pg17-v2-None] or test_pg_regress[release-pg17-v2-None] or test_pg_regress[debug-pg17-v2-None] or test_pg_regress[release-pg17-v2-None] or test_pg_regress[release-pg17-v2-None] or test_isolation[release-pg17-v2-4] or test_isolation[release-pg17-v2-4] or test_isolation[debug-pg17-v2-4] or test_isolation[release-pg17-v2-4] or test_isolation[release-pg17-v2-4] or test_isolation[release-pg17-v2-None] or test_isolation[release-pg17-v2-None] or test_isolation[release-pg17-v2-None] or test_isolation[debug-pg17-v2-None] or test_isolation[release-pg17-v2-None] or test_pg_regress[release-pg17-v1-4] or test_pg_regress[release-pg17-v1-4] or test_pg_regress[release-pg17-v1-4] or test_pg_regress[release-pg17-v1-4] or test_pg_regress[debug-pg17-v1-4] or test_sql_regress[release-pg17-v1-4] or test_sql_regress[release-pg17-v1-4] or test_sql_regress[debug-pg17-v1-4] or test_sql_regress[release-pg17-v1-4] or test_sql_regress[release-pg17-v1-4] or test_sql_regress[release-pg17-v2-4] or test_sql_regress[release-pg17-v2-4] or test_sql_regress[release-pg17-v2-4] or test_sql_regress[release-pg17-v2-4] or test_sql_regress[debug-pg17-v2-4] or test_pg_regress[release-pg17-v2-4] or test_pg_regress[release-pg17-v2-4] or test_pg_regress[debug-pg17-v2-4] or test_pg_regress[release-pg17-v2-4] or test_pg_regress[release-pg17-v2-4] or test_pitr_gc[release-pg17] or test_pitr_gc[release-pg17] or test_pitr_gc[debug-pg17] or test_pitr_gc[release-pg17] or test_pitr_gc[release-pg17] or test_readonly_node_gc[release-pg17] or test_readonly_node_gc[release-pg17] or test_readonly_node_gc[debug-pg17] or test_readonly_node_gc[release-pg17] or test_readonly_node_gc[release-pg17] or test_sharding_split_smoke[release-pg17] or test_sharding_split_smoke[release-pg17] or test_sharding_split_smoke[debug-pg17] or test_sharding_split_smoke[release-pg17] or test_sharding_split_smoke[release-pg17] or test_timeline_physical_size_post_gc[release-pg17] or test_timeline_physical_size_post_gc[release-pg17] or test_timeline_physical_size_post_gc[release-pg17] or test_timeline_physical_size_post_gc[release-pg17] or test_timeline_physical_size_post_gc[debug-pg17] or test_vm_bit_clear_on_heap_lock_blackbox[release-pg17] or test_vm_bit_clear_on_heap_lock_blackbox[release-pg17] or test_vm_bit_clear_on_heap_lock_blackbox[release-pg17] or test_vm_bit_clear_on_heap_lock_blackbox[release-pg17] or test_vm_bit_clear_on_heap_lock_blackbox[debug-pg17] or test_peer_recovery[release-pg17] or test_pull_timeline[release-pg17-True] or test_pull_timeline_gc[release-pg17] or test_pull_timeline_while_evicted[release-pg17] or test_pull_timeline_while_evicted[release-pg17] or test_explicit_timeline_creation[release-pg17] or test_explicit_timeline_creation[release-pg17] or test_timeline_copy[release-pg17-0] or test_timeline_copy[release-pg17-0] or test_timeline_copy[release-pg17-100000] or test_patch_control_file[release-pg17] or test_patch_control_file[release-pg17] or test_term_bump[release-pg17] or test_term_bump[release-pg17] or test_late_init[release-pg17] or test_broker_discovery[release-pg17] or test_broker_discovery[release-pg17] or test_pull_timeline_term_change[release-pg17] or test_s3_eviction[release-pg17-0.0-False] or test_s3_eviction[release-pg17-0.0-False] or test_s3_eviction[release-pg17-0.2-False] or test_backup_partial_reset[release-pg17] or test_backup_partial_reset[release-pg17] or test_pull_timeline_partial_segment_integrity[release-pg17] or test_pull_timeline_partial_segment_integrity[release-pg17] or test_pull_timeline_partial_segment_integrity[debug-pg17] or test_replace_safekeeper[release-pg17] or test_membership_api[release-pg17] or test_idle_reconnections[release-pg17] or test_timeline_copy[release-pg17-100] or test_s3_eviction[release-pg17-0.0-True] or test_restarts_under_load[release-pg17] or test_restarts_under_load[release-pg17] or test_restarts_frequent_checkpoints[release-pg17] or test_restarts_frequent_checkpoints[release-pg17] or test_compute_restarts[release-pg17] or test_concurrent_computes[release-pg17] or test_concurrent_computes[release-pg17] or test_concurrent_computes[debug-pg17] or test_unavailability[release-pg17] or test_unavailability[release-pg17] or test_recovery_uncommitted[release-pg17] or test_recovery_uncommitted[release-pg17] or test_wal_truncation[release-pg17-2] or test_wal_truncation[release-pg17-3] or test_wal_truncation[release-pg17-3] or test_race_conditions[release-pg17] or test_race_conditions[release-pg17] or test_race_conditions[debug-pg17] or test_wal_lagging[release-pg17] or test_wal_lagging[release-pg17] or test_quorum_sanity[release-pg17] or test_quorum_sanity[debug-pg17] or test_segment_init_failure[release-pg17] or test_pageserver_lsn_wait_error_start[release-pg17] or test_pageserver_lsn_wait_error_start[release-pg17] or test_pageserver_lsn_wait_error_start[debug-pg17] or test_pageserver_lsn_wait_error_safekeeper_stop[release-pg17] or test_pageserver_lsn_wait_error_safekeeper_stop[debug-pg17] or test_wal_restore[release-pg17] or test_wal_restore[release-pg17] or test_wal_restore_initdb[release-pg17] or test_wal_restore_initdb[debug-pg17] or test_wal_restore_http[release-pg17-True] or test_wal_restore_http[release-pg17-True] or test_wal_restore_http[debug-pg17-True] or test_wal_restore_http[release-pg17-False] or test_wal_restore_http[release-pg17-False] or test_wal_restore_http[debug-pg17-False] or test_walredo_not_left_behind_on_detach[release-pg17] or test_walredo_not_left_behind_on_detach[debug-pg17]"
Flaky tests (10)

Postgres 17

Postgres 16

Postgres 15

Postgres 14

Test coverage report is not available

The comment gets automatically updated with the latest test results
03bbce3 at 2025-07-01T11:12:25.902Z :recycle:

problame pushed a commit that referenced this pull request Jul 2, 2025
The introduction of the default lease deadline[^1] feature makes it so
that after PS restart, `.timeline_gc()` calls in Python tests are no-ops
for 10 minute after pageserver startup: the `gc_iteration()` bails with
`Skipping GC because lsn lease deadline is not reached`.

I did some impact analysis in the following PR. About 30 Python tests
are affected:
- #12411

Rust tests that don't explicitly enable periodic GC or invoke GC manually
are unaffected because we disable periodic GC by default in
the `TenantHarness`'s tenant config.
Two tests explicitly did `start_paused=true` + `tokio::time::advance()`,
but it would add cognitive and code bloat to each existing and future
test case that uses TenantHarness if we take that route.

So, this PR disables the default lease deadline feature in all tests.

refs
- [^1]: PR that introduced default lease deadline: https://github.com/neondatabase/neon/pull/9055/files
- fixes https://databricks.atlassian.net/browse/LKB-92
@problame
Copy link
Contributor Author

problame commented Jul 3, 2025

This PR was never intended to be merged, action on the analysis its CI results provided here:

@problame problame closed this Jul 3, 2025
github-merge-queue bot pushed a commit that referenced this pull request Jul 8, 2025
…#12431)

The introduction of the default lease deadline feature 9 months ago made
it so
that after PS restart, `.timeline_gc()` calls in Python tests are no-ops
for 10 minute after pageserver startup: the `gc_iteration()` bails with
`Skipping GC because lsn lease deadline is not reached`.

I did some impact analysis in the following PR. About 30 Python tests
are affected:
- #12411

Rust tests that don't explicitly enable periodic GC or invoke GC
manually
are unaffected because we disable periodic GC by default in
the `TenantHarness`'s tenant config.
Two tests explicitly did `start_paused=true` + `tokio::time::advance()`,
but it would add cognitive and code bloat to each existing and future
test case that uses TenantHarness if we took that route.

So, this PR sets the default lease deadline feature in both Python
and Rust tests to zero by default. Tests that test the feature were
thus identified by failing the test:
- Python test `test_readonly_node_gc` + `test_lsn_lease_size`
- Rust test `test_lsn_lease`.

To accomplish the above, I changed the code that computes the initial
lease
deadline to respect the pageserver.toml's default tenant config, which
it didn't before (and I would consider a bug). The Python test harness
and the Rust TenantHarness test harness then simply set the default
tenant
config field to zero.

Drive-by:
- `test_lsn_lease_size` was writing a lot of data unnecessarily; reduce
the amount and speed up the test

refs
- PR that introduced default lease deadline:
https://github.com/neondatabase/neon/pull/9055/files
- fixes https://databricks.atlassian.net/browse/LKB-92

---------

Co-authored-by: Christian Schwarz <Christian Schwarz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant