-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
User directory is performing state resolution, which results in unnecessary CPU usage #9797
Description
On April 11th, 2021 at ~12:00 UTC we saw matrix.org's user directory worker start using 100% CPU consistently, and continued doing so until restarted on April 12th 16:10 UTC.
It turns out that it was stuck doing state resolution for an IRC room with 123,000+ state events.
It's a little bit surprising that the user directory is doing state resolution at all though, as it should just be listening for membership changes happening on the current_state_deltas_stream
, and updating tables used for user directory search accordingly.
In the logs, we see the following repeated multiple times per second:
2021-04-12 00:00:44,506 - synapse.replication.tcp.handler - 496 - INFO - replication_command_handler@7f0b5b2e2268 - Handling 'POSITION events event_persister-2 1939721421 1939721422'
2021-04-12 00:00:44,506 - synapse.replication.tcp.handler - 549 - INFO - process-replication-data-48623630 - Caught up with stream 'events' to 1939721422
2021-04-12 00:00:44,507 - synapse.replication.tcp.handler - 496 - INFO - replication_command_handler@7f0b5b2e2268 - Handling 'POSITION events event_persister-2 1939721422 1939721423'
2021-04-12 00:00:44,507 - synapse.replication.tcp.handler - 549 - INFO - process-replication-data-48623632 - Caught up with stream 'events' to 1939721423
2021-04-12 00:00:44,610 - synapse.state - 576 - INFO - Measure[resolve_state_groups_for_events]@7f09dc222840 - Resolving state for !xxx:domain with groups [596595428, 596513551]
2021-04-12 00:00:44,714 - synapse.state.v1 - 84 - INFO - Measure[state._resolve_events]@7f09dc222d68 - Asking for 104/104 conflicted events
2021-04-12 00:00:44,715 - synapse.state.v1 - 118 - INFO - Measure[state._resolve_events]@7f09dc222d68 - Asking for 3/3 auth events
(Note that we are using redis replication, even if that code is in the tcp/handler.py
class).
So it seems that the user directory is listening to the events
stream (I think), in addition to the current_state_deltas
stream:
synapse/synapse/handlers/user_directory.py
Lines 160 to 162 in b7748d3
max_pos, deltas = await self.store.get_current_state_deltas( | |
self.pos, room_max_stream_ordering | |
) |
Ideally the user directory would just accept membership updates from other worker processes without needing to perform state resolution itself in the meantime.