-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Optimize how we calculate likely_domains
during backfill #13626
Description
Mentioned in internal doc. Part of #13356
Optimize how we calculate likely_domains
during backfill because I've seen this take 17s in production just to get_current_state
which is used to get_domains_from_state
(see case 2. Loading tons of events in the /messages
investigation issue).
There are 3 ways we currently calculate hosts that are in the room:
get_current_state
->get_domains_from_state
- Used in
backfill
to calculatelikely_domains
and/timestamp_to_event
because it was cargo-culted frombackfill
- Used in
get_current_hosts_in_room
- Used for other federation things like sending read receipts and typing indicators
get_hosts_in_room_at_events
- Used when pushing out events over federation to other servers in the
_process_event_queue_loop
- Used when pushing out events over federation to other servers in the
Query performance
The query from get_current_state
sucks just because we have to get all 80k events. And we see almost the exact same performance locally trying to get all of these events (16s vs 17s):
synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
Time: 16035.612 ms (00:16.036)
synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
Time: 4243.237 ms (00:04.243)
But what about get_current_hosts_in_room
: When there is 8M rows in the current_state_events
table, the query in get_current_hosts_in_room
takes 13s from complete freshness (when the events were first added). But takes 930ms after a Postgres restart or 390ms if running back to back to back.
$ psql synapse
synapse=# \timing on
synapse=# SELECT COUNT(DISTINCT substring(state_key FROM '@[^:]*:(.*)$'))
FROM current_state_events
WHERE
type = 'm.room.member'
AND membership = 'join'
AND room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
count
-------
4130
(1 row)
Time: 13181.598 ms (00:13.182)
synapse=# SELECT COUNT(*) from current_state_events where room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
count
-------
80814
synapse=# SELECT COUNT(*) from current_state_events;
count
---------
8162847
synapse=# SELECT pg_size_pretty( pg_total_relation_size('current_state_events') );
pg_size_pretty
----------------
4702 MB
See the in-flight PR #13575 for more details