Draft: Communicator rewrite #12692

hlinnaka · 2025-07-22T21:01:51Z

There's still a lot of work to be done here, but open a PR now already so that we get CI coverage on this, and compute image builds that we can try out in staging already.

…with-integrated-cache There were conflicts because of the differences in the page_api protocol that was merged to main vs what was on the branch. I adapted the code for the protocol in main.

Use that instead of the half-baked Adaptive Radix Tree implementation. ART would probably be better in the long run, but more complicated to implement.

When the LFC is shrunk, we punch holes in the underlying file to release the disk space to the OS. We tracked it in the same hash table as the in-use entries, because that was convenient. However, I'm working on being able to shrink the hash table too, and once we do that, we'll need some other place to track the holes. Implement a simple scheme of an in-memory array and a chain of on-disk blocks for that.

A runtime setting is nicer, but the next commit will replace the hash table with a different implementation that requires the value size to be a compile-time constant.

The new implementation lives in a separately allocated shared memory area, which could be resized. Resizing it isn't actually implemented yet, though. It would require some co-operation from the LFC code.

The new communicator has its own tracking

This makes the test_replica_query_race test pass, and probably some other read replica tests too.

More logs is useful during debugging, but it's time to crank it down a notch...

I made this change to one the is_write==true case earlier already, but the is_write==false codepath needs the same treatment.

…icator

Fixes remaining test_hot_standby.py failures

This adds a new request type between backend and communicator, to make a getpage request at a given LSN, bypassing the LFC. Only used by the get_raw_page_at_lsn() debugging/testing function.

Switch to the 'measured' crate everywhere in the communicator. Connect the allocator metrics to the metrics endpoint.

compute_ctl does it based on prefer_protocol now

Pass back a suitable 'errno' from the communicator process to the originating backend in all cases. Usually it's just EIO because we don't have a good way to map from tonic StatusCodes to libc error numbers. That's probably good enough; from the original backend's perspective all errors are IO errors. In the C code, set libc errno variable before calling ereport(), so that errcode_for_file_access() works. And once we do that, we can replace pg_strerror() calls with %m.

The get_num_shards() function, called from the WAL proposer, requires it. Fixes test_timeline_size_quota_on_startup

The error message is just a little different with gRPC.

I added a fixture to run these tests with and without grpc, but missed passing the option to one endpoint creation.

Namely, this makes it pass with the new communicator, which doesn't do chunking at all.

The spawned thread didn't have the tokio runtime active, which lead to this error: ERROR lsn_lease_bg_task{tenant_id=1bb647cb7d3974b52e74f7442fa7d059 timeline_id=cf41456d3202e1c3940cb8f372d160ab lsn=0/1576000}:panic{thread=<unnamed> location=compute_tools/src/lsn_lease.rs:201:5}: there is no reactor running, must be called from the context of a Tokio 1.x runtime Fixes `test_readonly_node_gc`

bizwark and others added 30 commits May 19, 2025 06:33

Add first iteration of simulating a flakey network with a custom TCP.

3acb263

Add back whitespace that was removed.

0dddb1e

Return info message that was used for debugging.

ac464c5

Remove unnecessary info include now that the info message is gone.

31fa7a5

Set default max consumers per connection to a high number.

60a0bec

Merge remote-tracking branch 'origin/main' into communicator-rewrite-…

bb28109

…with-integrated-cache There were conflicts because of the differences in the page_api protocol that was merged to main vs what was on the branch. I adapted the code for the protocol in main.

Use a sempahore to gate access to connections. Add metrics for testing.

af9379c

Add a new iteration of a new client pool with some updates.

014823b

Fix compile warnings, minor cleanup.

7c9bd54

Add placeholder shmem hashmap implementation

009168d

Use that instead of the half-baked Adaptive Radix Tree implementation. ART would probably be better in the long run, but more complicated to implement.

use separate hash tables for relsize cache and block mappings

33549ba

Add metrics to track memory usage of the rust communicator

b3c2541

Implement growing the hash table. Fix unit tests.

f06bb2b

Merge branch 'main' into communicator-rewrite

745b750

Fix Linux build failures

b36f880

pageserver: remove gRPC compute service prototype

69a47d7

Merge branch 'main' into communicator-rewrite

8202c61

Merge branch 'main' into communicator-rewrite

4b6f02e

Merge branch 'main' into communicator-rewrite

37c5852

Make LFC chunk size a compile-time constant

96b4de1

A runtime setting is nicer, but the next commit will replace the hash table with a different implementation that requires the value size to be a compile-time constant.

Remove generated communicator_bindings.h

6d45165

Move neon-shmem facility to separate module within the crate

6145cfd

Fix RelTag fields

9583805

impl Default for SlabBlockHeader

328f28d

Misc build fixes

2fb6164

Use a custom Rust implementation to replace the LFC hash table

10b936b

The new implementation lives in a separately allocated shared memory area, which could be resized. Resizing it isn't actually implemented yet, though. It would require some co-operation from the LFC code.

Mangle gRPC connstrings to use port 51051

28a6174

Ignore communicator_bindings.h

8b494f6

avoid hitting assertion failure in MarkPostmasterChildWalSender()

255537d

hlinnaka added 30 commits July 30, 2025 17:31

Don't update the legacy last-written LSN cache with new communicator

fca52af

The new communicator has its own tracking

Fix updating last-written LSN when WAL redo skips updating a block

af5e3da

This makes the test_replica_query_race test pass, and probably some other read replica tests too.

Crank down the logging

688990e

More logs is useful during debugging, but it's time to crank it down a notch...

Evict and retry if the block hash map is full

c036064

I made this change to one the is_write==true case earlier already, but the is_write==false codepath needs the same treatment.

don't try to update the legacy last-written LSN cache with new commun…

49204b6

…icator

Fix relsize caching in hot standby mode

3dfa2fc

Fixes remaining test_hot_standby.py failures

Merge remote-tracking branch 'origin/main' into communicator-rewrite

768fc10

Merge remote-tracking branch 'origin/main' into communicator-rewrite

c8b875c

Handle get_raw_page_at_lsn() debugging function properly

4016808

This adds a new request type between backend and communicator, to make a getpage request at a given LSN, bypassing the LFC. Only used by the get_raw_page_at_lsn() debugging/testing function.

Run pgindent on the new communicator C code

c8042f9

Fix LFC stats exposed by the built-in prometheus endpoint

0428164

More work on metrics

8a4f16a

Switch to the 'measured' crate everywhere in the communicator. Connect the allocator metrics to the metrics endpoint.

cargo fmt

5e2a19c

Set neon.use_communicator_worker GUC based on prefer_protocol

b4808a4

fix test scripts to not set neon.use_communicator_worker anymore

84f4dcd

compute_ctl does it based on prefer_protocol now

fix clippy warnings

c509d53

Merge remote-tracking branch 'origin/main' into communicator-rewrite

17cd611

Merge remote-tracking branch 'origin/main' into communicator-rewrite

e1df054

Set num_shards in shared memory.

bb1f50b

The get_num_shards() function, called from the WAL proposer, requires it. Fixes test_timeline_size_quota_on_startup

dial down smgr trace logging to same level as on 'main'

e1c7d79

cargo fmt

b72f410

rever unintentional changes to submodules

ede37c5

Silence test failure with gRPC

5030249

The error message is just a little different with gRPC.

Fix test_lfc_prewarm.py test failure

b78cdfe

reformat

26bd994

Fix LFC prewarm cancellation

4a031b9

fix prewarm test with grpc

e466cd1

I added a fixture to run these tests with and without grpc, but missed passing the option to one endpoint creation.

Make LFC prewarming test case less sensitive to LFC chunk size

8ed56de

Namely, this makes it pass with the new communicator, which doesn't do chunking at all.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft: Communicator rewrite #12692

Draft: Communicator rewrite #12692

Uh oh!

hlinnaka commented Jul 22, 2025

Uh oh!

Uh oh!

Draft: Communicator rewrite #12692

Are you sure you want to change the base?

Draft: Communicator rewrite #12692

Uh oh!

Conversation

hlinnaka commented Jul 22, 2025

Uh oh!

Uh oh!