-
Notifications
You must be signed in to change notification settings - Fork 60
Add test to determine that the public room list is correctly updated after lazy load finishes syncing state #585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…zy load completes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...I've only just now seen that this is a draft. Oh well. I hope the thoughts are useful anyway
// check the number of joined users, it should now be 3 | ||
res2 := terry.MustDoFunc(t, "GET", []string{"_matrix", "client", "v3", "publicRooms"}) | ||
body2, err2 := ioutil.ReadAll(res2.Body) | ||
if err2 != nil { | ||
t.Fatalf("something broke: %v", err2) | ||
} | ||
numJoinedMembers2 := gjson.GetBytes(body2, "chunk.0.num_joined_members") | ||
if numJoinedMembers2.Int() != 3 { | ||
t.Fatalf("Expected 3 users, returned %s", numJoinedMembers2) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran this locally and it flakes randomly. I think it's because the stats updater is its own background loop in synapse and isn't guaranteed to be done when the partial state join completes.
For things like this, I think we do a loop with a timeout in other parts of complement, which I can't find examples of at the moment.
I also ran the test with workers (took a bit of fiddling) and it passed, which was surprising because I thought synapse was missing a bit to poke the notifier when updating the current state for a partial state room.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a problem that the test passes with workers (i.e. does that violate any assumptions that we should look deeper into) or is it fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little. I'm looking into why it passes with workers at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've gotten to the bottom of this.
In the complement worker setup, we have 2 event persisters.
event_persister2
updates the current room state and sends out an EventsStreamCurrentStateRow
over replication, with stream id 3 for example. This doesn't wake the stats updater.
master | 2023-01-18 17:39:31,478 - synapse.handlers.federation - 1797 - INFO - sync_partial_state_room-0 - Updating current state for !0-rPjczVynKDaEaE3CDT:host.docker.internal:44815
...
event_persister2 | 2023-01-18 17:39:31,533 - synapse.access.http.18010 - 460 - INFO - POST-16 - ::ffff:127.0.0.1 - 18010 - {None} Processed request: 0.011sec/0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 2B 200 "POST /_synapse/repl
ication/update_current_state/%210-rPjczVynKDaEaE3CDT%3Ahost.docker.internal%3A44815/yJPxeSawfh HTTP/1.1" "Synapse/1.75.0rc2" [0 dbevts]
master | 2023-01-18 17:39:31,533 - synapse.http.client - 460 - INFO - sync_partial_state_room-0 - Received response to POST http://localhost:18010/_synapse/replication/update_current_state/%210-rPjczVynKDaEaE3CDT%3Ahost.docker.internal%3
A44815/yJPxeSawfh: 200
event_persister2 | 2023-01-18 17:39:31,653 - synapse.replication.tcp.resource - 193 - INFO - replication_notifier-5 - Streaming: events -> 3
Event persisters periodically send updated events
stream positions over replication. So event_persister1
comes along and sends a position update shortly after.
event_persister1 | 2023-01-18 17:39:31,808 - synapse.replication.tcp.resource - 216 - INFO - replication_notifier-4 - Sending position: events -> 3
Position updates do wake the stats updater and this makes things appear to work.
So we would expect the test to fall over if we removed the extra event persister - and it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to make these changes to test in worker mode and coax the failure out:
Synapse changes
diff --git a/docker/complement/conf/start_for_complement.sh b/docker/complement/conf/start_for_complement.sh
index 49d79745b0..ef69e8b330 100755
--- a/docker/complement/conf/start_for_complement.sh
+++ b/docker/complement/conf/start_for_complement.sh
@@ -51,7 +51,6 @@ if [[ -n "$SYNAPSE_COMPLEMENT_USE_WORKERS" ]]; then
# -z True if the length of string is zero.
if [[ -z "$SYNAPSE_WORKER_TYPES" ]]; then
export SYNAPSE_WORKER_TYPES="\
- event_persister, \
event_persister, \
background_worker, \
frontend_proxy, \
diff --git a/docker/complement/conf/workers-shared-extra.yaml.j2 b/docker/complement/conf/workers-shared-extra.yaml.j2
index 1170694df5..62f74d9333 100644
--- a/docker/complement/conf/workers-shared-extra.yaml.j2
+++ b/docker/complement/conf/workers-shared-extra.yaml.j2
@@ -96,10 +96,8 @@ experimental_features:
msc2716_enabled: true
# server-side support for partial state in /send_join responses
msc3706_enabled: true
- {% if not workers_in_use %}
# client-side support for partial state in /send_join responses
faster_joins: true
- {% endif %}
# Filtering /messages by relation type.
msc3874_enabled: true
# Enable deleting device-specific notification settings stored in account data
diff --git a/scripts-dev/complement.sh b/scripts-dev/complement.sh
index 7c48d8bccb..324269a2fd 100755
--- a/scripts-dev/complement.sh
+++ b/scripts-dev/complement.sh
@@ -215,6 +215,7 @@ if [[ -n "$WORKERS" ]]; then
# time (the main problem is that we start 14 python processes for each test,
# and complement likes to do two of them in parallel).
export COMPLEMENT_SPAWN_HS_TIMEOUT_SECS=120
+ test_tags="$test_tags,faster_joins,msc2716"
else
export PASS_SYNAPSE_COMPLEMENT_USE_WORKERS=
if [[ -n "$POSTGRES" ]]; then
diff --git a/synapse/app/generic_worker.py b/synapse/app/generic_worker.py
index 8108b1e98f..c406aba7cd 100644
--- a/synapse/app/generic_worker.py
+++ b/synapse/app/generic_worker.py
@@ -282,12 +282,12 @@ def start(config_options: List[str]) -> None:
"synapse.app.user_dir",
)
- if config.experimental.faster_joins_enabled:
- raise ConfigError(
- "You have enabled the experimental `faster_joins` config option, but it is "
- "not compatible with worker deployments yet. Please disable `faster_joins` "
- "or run Synapse as a single process deployment instead."
- )
+ #if config.experimental.faster_joins_enabled:
+ # raise ConfigError(
+ # "You have enabled the experimental `faster_joins` config option, but it is "
+ # "not compatible with worker deployments yet. Please disable `faster_joins` "
+ # "or run Synapse as a single process deployment instead."
+ # )
synapse.events.USE_FROZEN_DICTS = config.server.use_frozen_dicts
synapse.util.caches.TRACK_MEMORY_USAGE = config.caches.track_memory_usage
and I ran the test using
WORKERS=1 COMPLEMENT_DIR=../complement scripts-dev/complement.sh -run TestPartialStateJoin/Room_stats* -p 1 | less
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for resolving the flake!
Related: matrix-org/synapse#12814, although I don't think this closes that issue as we still need to deal with this issue under a worker-ized setup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Big thank you for getting the tests written so quickly.
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
Merged, thank you. |
As the title states.