Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Optimize how we calculate likely_domains during backfill #13626

@MadLittleMods

Description

@MadLittleMods

Mentioned in internal doc. Part of #13356


Optimize how we calculate likely_domains during backfill because I've seen this take 17s in production just to get_current_state which is used to get_domains_from_state (see case 2. Loading tons of events in the /messages investigation issue).

There are 3 ways we currently calculate hosts that are in the room:

  1. get_current_state -> get_domains_from_state
    • Used in backfill to calculate likely_domains and /timestamp_to_event because it was cargo-culted from backfill
  2. get_current_hosts_in_room
    • Used for other federation things like sending read receipts and typing indicators
  3. get_hosts_in_room_at_events
    • Used when pushing out events over federation to other servers in the _process_event_queue_loop

Query performance

The query from get_current_state sucks just because we have to get all 80k events. And we see almost the exact same performance locally trying to get all of these events (16s vs 17s):

synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
Time: 16035.612 ms (00:16.036)

synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
Time: 4243.237 ms (00:04.243)

But what about get_current_hosts_in_room: When there is 8M rows in the current_state_events table, the query in get_current_hosts_in_room takes 13s from complete freshness (when the events were first added). But takes 930ms after a Postgres restart or 390ms if running back to back to back.

$ psql synapse
synapse=# \timing on
synapse=# SELECT COUNT(DISTINCT substring(state_key FROM '@[^:]*:(.*)$'))
FROM current_state_events
WHERE
    type = 'm.room.member'
    AND membership = 'join'
    AND room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
 count
-------
  4130
(1 row)

Time: 13181.598 ms (00:13.182)

synapse=# SELECT COUNT(*) from current_state_events where room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
 count
-------
 80814

synapse=# SELECT COUNT(*) from current_state_events;
  count
---------
 8162847

synapse=# SELECT pg_size_pretty( pg_total_relation_size('current_state_events') );
 pg_size_pretty
----------------
 4702 MB

See the in-flight PR #13575 for more details

Metadata

Metadata

Assignees

Labels

A-Messages-Endpoint/messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill)A-PerformancePerformance, both client-facing and admin-facingO-UncommonMost users are unlikely to come across this or unexpected workflowS-MinorBlocks non-critical functionality, workarounds exist.T-EnhancementNew features, changes in functionality, improvements in performance, or user-facing enhancements.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions