Skip to content

Conversation

@erikjohnston
Copy link
Member

@erikjohnston erikjohnston commented Jun 12, 2025

Fixes #17370

@erikjohnston erikjohnston force-pushed the erikj/cleanup_federation_inbox branch from 15d3027 to dcf32ac Compare June 13, 2025 08:46
@erikjohnston erikjohnston force-pushed the erikj/cleanup_federation_inbox branch from dcf32ac to 83870d2 Compare June 13, 2025 09:44
@erikjohnston erikjohnston force-pushed the erikj/cleanup_federation_inbox branch from 4b46247 to 03bd0bd Compare June 13, 2025 11:15
Copy link
Contributor

@MatMaul MatMaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's... never deleted currently ?? 🫣 🫣

I think the corresponding issue would more be #17370.

I am trying to grasp the potential consequences of ingesting the to_device EDUs several times. From my understanding of the code the to_devices would be delivered several times to the devices.

Is it fine ? If not, going from forever to a week seems a bit of a gap. If it is, we could perhaps just deliver several times, since it is covering an edge case and shouldn't happen that often (request would have go through but not the response).

We could also ditch device_federation_inbox altogether and keep idempotency if we add message_id to device_inbox.

@erikjohnston
Copy link
Member Author

I think the corresponding issue would more be #17370.

Ah indeed, that was the issue I meant to link to 🤦

I am trying to grasp the potential consequences of ingesting the to_device EDUs several times. From my understanding of the code the to_devices would be delivered several times to the devices.

I think that can happen anyway? Though would also be happy increasing the delay to a month or whatever, this table isn't particularly large.

We could also ditch device_federation_inbox altogether and keep idempotency if we add message_id to device_inbox.

Unfortunately we delete things from device_inbox once the message has been delivered to clients, so that wouldn't work alas.

@MatMaul
Copy link
Contributor

MatMaul commented Jun 13, 2025

I think that can happen anyway?

Then perhaps ditching the table instead is better? better in the less code is better way.

@erikjohnston
Copy link
Member Author

FTR I've checked internally with crypto team, and at least e2ee use cases will handle retransmission fine.

@erikjohnston erikjohnston marked this pull request as ready for review June 16, 2025 13:15
@erikjohnston erikjohnston requested a review from a team as a code owner June 16, 2025 13:15
@devonh
Copy link
Member

devonh commented Jun 16, 2025

For future reference:
If we wanted to consider removing this table altogether in the future, we could add some logging/metrics here to track how often we are actually hitting this case. I imagine it's probably worth the overhead of tracking processed events versus the overhead of processing duplicates & ensuring there are no other side effects.

That work should fall under a different issue if it's deemed worth the effort.



# How long to keep messages in the device federation inbox before deleting them.
DEVICE_FEDERATION_INBOX_CLEANUP_DELAY_MS = 7 * 24 * 60 * 60 * 1000 # 7 days
Copy link
Contributor

@MadLittleMods MadLittleMods Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth building these up to avoid mistakes and make it more clear using the constants

# Time-based constants.
#
# Laying these out incrementally, even if only some are required, helps with
# readability and catching bugs.
ONE_MINUTE_SECONDS = 60
ONE_HOUR_SECONDS = 60 * ONE_MINUTE_SECONDS

We will need a few more:

ONE_MINUTE_SECONDS = 60
ONE_HOUR_SECONDS = 60 * ONE_MINUTE_SECONDS
ONE_DAY_SECONDS = 24 * ONE_HOUR_SECONDS

MILLISECONDS_PER_SECOND = 1000
Suggested change
DEVICE_FEDERATION_INBOX_CLEANUP_DELAY_MS = 7 * 24 * 60 * 60 * 1000 # 7 days
DEVICE_FEDERATION_INBOX_CLEANUP_DELAY_MS = 7 * ONE_DAY_SECONDS * MILLISECONDS_PER_SECOND

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a new Duration class was added.

Are we going to remove the other constants?

# Time-based constants.
#
# Laying these out incrementally, even if only some are required, helps with
# readability and catching bugs.
ONE_MINUTE_SECONDS = 60
ONE_HOUR_SECONDS = 60 * ONE_MINUTE_SECONDS

How are we going to handle the seconds variation (vs MS)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikjohnston Still relevant ^

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to remove the other constants?

Yes, we probably should, but didn't want to do it here

How are we going to handle the seconds variation (vs MS)?

Generally we rarely operate in seconds, except at twisted boundaries. TBH I think its fine in those cases to just / 1000.

@erikjohnston erikjohnston enabled auto-merge (squash) June 17, 2025 14:45
@erikjohnston erikjohnston merged commit 33e0c25 into develop Jun 18, 2025
103 of 109 checks passed
@erikjohnston erikjohnston deleted the erikj/cleanup_federation_inbox branch June 18, 2025 11:58
@amandahla
Copy link
Contributor

amandahla commented Jul 2, 2025

Hi, during my tests with 1.132.0 I found these errors logs, is this related?

2025-07-01T21:14:03.0173989Z 2025-07-01T21:10:43.169Z [synapse] 2025-07-01 21:10:43,169 - synapse.storage.background_updates - 621 - INFO - background_updates-0 - Running background update 'fixup_max_depth_cap'. Processed 0 items in 2ms. (total_rate=None/ms, current_rate=None/ms, total_updated=0, batch_size=100)
2025-07-01T21:14:03.0174484Z 2025-07-01T21:10:43.170Z [synapse] 2025-07-01 21:10:43,170 - synapse.storage.background_updates - 419 - ERROR - background_updates-0 - Error doing update: 'device_federation_inbox_received_ts_index'
2025-07-01T21:14:03.0174641Z 2025-07-01T21:10:43.170Z [synapse] Traceback (most recent call last):
2025-07-01T21:14:03.0175094Z 2025-07-01T21:10:43.170Z [synapse]   File "/usr/local/lib/python3.12/dist-packages/synapse/storage/background_updates.py", line 416, in run_background_updates
2025-07-01T21:14:03.0175297Z 2025-07-01T21:10:43.170Z [synapse]     result = await self.do_next_background_update(sleep)
2025-07-01T21:14:03.0175457Z 2025-07-01T21:10:43.170Z [synapse]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-01T21:14:03.0175895Z 2025-07-01T21:10:43.170Z [synapse]   File "/usr/local/lib/python3.12/dist-packages/synapse/storage/background_updates.py", line 563, in do_next_background_update
2025-07-01T21:14:03.0176181Z 2025-07-01T21:10:43.170Z [synapse]     update_info = self._background_update_handlers[self._current_background_update]
2025-07-01T21:14:03.0176369Z 2025-07-01T21:10:43.170Z [synapse]                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-01T21:14:03.0176572Z 2025-07-01T21:10:43.170Z [synapse] KeyError: 'device_federation_inbox_received_ts_index'
2025-07-01T21:14:03.0177066Z 2025-07-01T21:10:43.171Z [synapse] 2025-07-01 21:10:43,170 - synapse.storage.background_updates - 419 - ERROR - background_updates-0 - Error doing update: 'device_federation_inbox_received_ts_index'
2025-07-01T21:14:03.0177222Z 2025-07-01T21:10:43.171Z [synapse] Traceback (most recent call last):
2025-07-01T21:14:03.0177655Z 2025-07-01T21:10:43.171Z [synapse]   File "/usr/local/lib/python3.12/dist-packages/synapse/storage/background_updates.py", line 416, in run_background_updates
2025-07-01T21:14:03.0177851Z 2025-07-01T21:10:43.171Z [synapse]     result = await self.do_next_background_update(sleep)
2025-07-01T21:14:03.0178000Z 2025-07-01T21:10:43.171Z [synapse]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-01T21:14:03.0178599Z 2025-07-01T21:10:43.171Z [synapse]   File "/usr/local/lib/python3.12/dist-packages/synapse/storage/background_updates.py", line 563, in do_next_background_update
2025-07-01T21:14:03.0179097Z 2025-07-01T21:10:43.171Z [synapse]     update_info = self._background_update_handlers[self._current_background_update]
2025-07-01T21:14:03.0179294Z 2025-07-01T21:10:43.171Z [synapse]                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-01T21:14:03.0179501Z 2025-07-01T21:10:43.171Z [synapse] KeyError: 'device_federation_inbox_received_ts_index'
2025-07-01T21:14:03.0180009Z 2025-07-01T21:10:43.171Z [synapse] 2025-07-01 21:10:43,171 - synapse.storage.background_updates - 419 - ERROR - background_updates-0 - Error doing update: 'device_federation_inbox_received_ts_index'
2025-07-01T21:14:03.0180169Z 2025-07-01T21:10:43.171Z [synapse] Traceback (most recent call last):
2025-07-01T21:14:03.0180727Z 2025-07-01T21:10:43.171Z [synapse]   File "/usr/local/lib/python3.12/dist-packages/synapse/storage/background_updates.py", line 416, in run_background_updates
2025-07-01T21:14:03.0180936Z 2025-07-01T21:10:43.171Z [synapse]     result = await self.do_next_background_update(sleep)
2025-07-01T21:14:03.0181098Z 2025-07-01T21:10:43.171Z [synapse]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-01T21:14:03.0181542Z 2025-07-01T21:10:43.171Z [synapse]   File "/usr/local/lib/python3.12/dist-packages/synapse/storage/background_updates.py", line 563, in do_next_background_update
2025-07-01T21:14:03.0181820Z 2025-07-01T21:10:43.171Z [synapse]     update_info = self._background_update_handlers[self._current_background_update]
2025-07-01T21:14:03.0182009Z 2025-07-01T21:10:43.171Z [synapse]                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-01T21:14:03.0182206Z 2025-07-01T21:10:43.171Z [synapse] KeyError: 'device_federation_inbox_received_ts_index'
2025-07-01T21:14:03.0182883Z 2025-07-01T21:10:43.171Z [synapse] 2025-07-01 21:10:43,171 - synapse.storage.background_updates - 419 - ERROR - background_updates-0 - Error doing update: 'device_federation_inbox_received_ts_index'
2025-07-01T21:14:03.0183042Z 2025-07-01T21:10:43.171Z [synapse] Traceback (most recent call last):
2025-07-01T21:14:03.0183480Z 2025-07-01T21:10:43.171Z [synapse]   File "/usr/local/lib/python3.12/dist-packages/synapse/storage/background_updates.py", line 416, in run_background_updates
2025-07-01T21:14:03.0183679Z 2025-07-01T21:10:43.171Z [synapse]     result = await self.do_next_background_update(sleep)
2025-07-01T21:14:03.0183837Z 2025-07-01T21:10:43.171Z [synapse]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-01T21:14:03.0184268Z 2025-07-01T21:10:43.171Z [synapse]   File "/usr/local/lib/python3.12/dist-packages/synapse/storage/background_updates.py", line 563, in do_next_background_update
2025-07-01T21:14:03.0184542Z 2025-07-01T21:10:43.171Z [synapse]     update_info = self._background_update_handlers[self._current_background_update]
2025-07-01T21:14:03.0184726Z 2025-07-01T21:10:43.171Z [synapse]                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-01T21:14:03.0184922Z 2025-07-01T21:10:43.171Z [synapse] KeyError: 'device_federation_inbox_received_ts_index'
2025-07-01T21:14:03.0185410Z 2025-07-01T21:10:43.172Z [synapse] 2025-07-01 21:10:43,171 - synapse.storage.background_updates - 419 - ERROR - background_updates-0 - Error doing update: 'device_federation_inbox_received_ts_index'
2025-07-01T21:14:03.0185555Z 2025-07-01T21:10:43.172Z [synapse] Traceback (most recent call last):
2025-07-01T21:14:03.0185986Z 2025-07-01T21:10:43.172Z [synapse]   File "/usr/local/lib/python3.12/dist-packages/synapse/storage/background_updates.py", line 416, in run_background_updates
2025-07-01T21:14:03.0186172Z 2025-07-01T21:10:43.172Z [synapse]     result = await self.do_next_background_update(sleep)
2025-07-01T21:14:03.0186327Z 2025-07-01T21:10:43.172Z [synapse]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-01T21:14:03.0186902Z 2025-07-01T21:10:43.172Z [synapse]   File "/usr/local/lib/python3.12/dist-packages/synapse/storage/background_updates.py", line 563, in do_next_background_update
2025-07-01T21:14:03.0187278Z 2025-07-01T21:10:43.172Z [synapse]     update_info = self._background_update_handlers[self._current_background_update]
2025-07-01T21:14:03.0187457Z 2025-07-01T21:10:43.172Z [synapse]                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-01T21:14:03.0187657Z 2025-07-01T21:10:43.172Z [synapse] KeyError: 'device_federation_inbox_received_ts_index'
2025-07-01T21:14:03.0188163Z 2025-07-01T21:10:43.172Z [synapse] 2025-07-01 21:10:43,172 - synapse.metrics.background_process_metrics - 253 - ERROR - background_updates-0 - Background process 'background_updates' threw an exception
2025-07-01T21:14:03.0188312Z 2025-07-01T21:10:43.172Z [synapse] Traceback (most recent call last):
2025-07-01T21:14:03.0188735Z 2025-07-01T21:10:43.172Z [synapse]   File "/usr/local/lib/python3.12/dist-packages/synapse/storage/background_updates.py", line 416, in run_background_updates
2025-07-01T21:14:03.0188928Z 2025-07-01T21:10:43.172Z [synapse]     result = await self.do_next_background_update(sleep)
2025-07-01T21:14:03.0189081Z 2025-07-01T21:10:43.172Z [synapse]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-01T21:14:03.0189514Z 2025-07-01T21:10:43.172Z [synapse]   File "/usr/local/lib/python3.12/dist-packages/synapse/storage/background_updates.py", line 563, in do_next_background_update
2025-07-01T21:14:03.0189779Z 2025-07-01T21:10:43.172Z [synapse]     update_info = self._background_update_handlers[self._current_background_update]
2025-07-01T21:14:03.0189960Z 2025-07-01T21:10:43.172Z [synapse]                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-01T21:14:03.0190148Z 2025-07-01T21:10:43.172Z [synapse] KeyError: 'device_federation_inbox_received_ts_index'
2025-07-01T21:14:03.0190230Z 2025-07-01T21:10:43.172Z [synapse] 
2025-07-01T21:14:03.0190596Z 2025-07-01T21:10:43.172Z [synapse] During handling of the above exception, another exception occurred:
2025-07-01T21:14:03.0190675Z 2025-07-01T21:10:43.172Z [synapse] 
2025-07-01T21:14:03.0190825Z 2025-07-01T21:10:43.172Z [synapse] Traceback (most recent call last):
2025-07-01T21:14:03.0191221Z 2025-07-01T21:10:43.172Z [synapse]   File "/usr/local/lib/python3.12/dist-packages/synapse/metrics/background_process_metrics.py", line 251, in run
2025-07-01T21:14:03.0191383Z 2025-07-01T21:10:43.172Z [synapse]     return await func(*args, **kwargs)
2025-07-01T21:14:03.0191518Z 2025-07-01T21:10:43.172Z [synapse]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-01T21:14:03.0191947Z 2025-07-01T21:10:43.172Z [synapse]   File "/usr/local/lib/python3.12/dist-packages/synapse/storage/background_updates.py", line 423, in run_background_updates
2025-07-01T21:14:03.0192072Z 2025-07-01T21:10:43.172Z [synapse]     raise RuntimeError(
2025-07-01T21:14:03.0192315Z 2025-07-01T21:10:43.172Z [synapse] RuntimeError: 5 back-to-back background update failures; aborting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

device_federation_inbox table is never emptied

6 participants