Skip to content

Conversation

H-Shay
Copy link
Contributor

@H-Shay H-Shay commented Jan 18, 2023

As the title states.

@H-Shay H-Shay requested review from a team as code owners January 18, 2023 03:31
@H-Shay H-Shay removed request for a team January 18, 2023 03:31
@H-Shay H-Shay marked this pull request as draft January 18, 2023 03:31
Copy link
Contributor

@DMRobertson DMRobertson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...I've only just now seen that this is a draft. Oh well. I hope the thoughts are useful anyway

Comment on lines 3340 to 3349
// check the number of joined users, it should now be 3
res2 := terry.MustDoFunc(t, "GET", []string{"_matrix", "client", "v3", "publicRooms"})
body2, err2 := ioutil.ReadAll(res2.Body)
if err2 != nil {
t.Fatalf("something broke: %v", err2)
}
numJoinedMembers2 := gjson.GetBytes(body2, "chunk.0.num_joined_members")
if numJoinedMembers2.Int() != 3 {
t.Fatalf("Expected 3 users, returned %s", numJoinedMembers2)
}
Copy link
Contributor

@squahtx squahtx Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this locally and it flakes randomly. I think it's because the stats updater is its own background loop in synapse and isn't guaranteed to be done when the partial state join completes.
For things like this, I think we do a loop with a timeout in other parts of complement, which I can't find examples of at the moment.

I also ran the test with workers (took a bit of fiddling) and it passed, which was surprising because I thought synapse was missing a bit to poke the notifier when updating the current state for a partial state room.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a problem that the test passes with workers (i.e. does that violate any assumptions that we should look deeper into) or is it fine?

Copy link
Contributor

@squahtx squahtx Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little. I'm looking into why it passes with workers at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've gotten to the bottom of this.

In the complement worker setup, we have 2 event persisters.
event_persister2 updates the current room state and sends out an EventsStreamCurrentStateRow over replication, with stream id 3 for example. This doesn't wake the stats updater.

master | 2023-01-18 17:39:31,478 - synapse.handlers.federation - 1797 - INFO - sync_partial_state_room-0 - Updating current state for !0-rPjczVynKDaEaE3CDT:host.docker.internal:44815
...
event_persister2 | 2023-01-18 17:39:31,533 - synapse.access.http.18010 - 460 - INFO - POST-16 - ::ffff:127.0.0.1 - 18010 - {None} Processed request: 0.011sec/0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 2B 200 "POST /_synapse/repl
ication/update_current_state/%210-rPjczVynKDaEaE3CDT%3Ahost.docker.internal%3A44815/yJPxeSawfh HTTP/1.1" "Synapse/1.75.0rc2" [0 dbevts]
master | 2023-01-18 17:39:31,533 - synapse.http.client - 460 - INFO - sync_partial_state_room-0 - Received response to POST http://localhost:18010/_synapse/replication/update_current_state/%210-rPjczVynKDaEaE3CDT%3Ahost.docker.internal%3
A44815/yJPxeSawfh: 200
event_persister2 | 2023-01-18 17:39:31,653 - synapse.replication.tcp.resource - 193 - INFO - replication_notifier-5 - Streaming: events -> 3

Event persisters periodically send updated events stream positions over replication. So event_persister1 comes along and sends a position update shortly after.

event_persister1 | 2023-01-18 17:39:31,808 - synapse.replication.tcp.resource - 216 - INFO - replication_notifier-4 - Sending position: events -> 3

Position updates do wake the stats updater and this makes things appear to work.

So we would expect the test to fall over if we removed the extra event persister - and it does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to make these changes to test in worker mode and coax the failure out:

Synapse changes
diff --git a/docker/complement/conf/start_for_complement.sh b/docker/complement/conf/start_for_complement.sh
index 49d79745b0..ef69e8b330 100755
--- a/docker/complement/conf/start_for_complement.sh
+++ b/docker/complement/conf/start_for_complement.sh
@@ -51,7 +51,6 @@ if [[ -n "$SYNAPSE_COMPLEMENT_USE_WORKERS" ]]; then
   # -z True if the length of string is zero.
   if [[ -z "$SYNAPSE_WORKER_TYPES" ]]; then
     export SYNAPSE_WORKER_TYPES="\
-      event_persister, \
       event_persister, \
       background_worker, \
       frontend_proxy, \
diff --git a/docker/complement/conf/workers-shared-extra.yaml.j2 b/docker/complement/conf/workers-shared-extra.yaml.j2
index 1170694df5..62f74d9333 100644
--- a/docker/complement/conf/workers-shared-extra.yaml.j2
+++ b/docker/complement/conf/workers-shared-extra.yaml.j2
@@ -96,10 +96,8 @@ experimental_features:
   msc2716_enabled: true
   # server-side support for partial state in /send_join responses
   msc3706_enabled: true
-  {% if not workers_in_use %}
   # client-side support for partial state in /send_join responses
   faster_joins: true
-  {% endif %}
   # Filtering /messages by relation type.
   msc3874_enabled: true
   # Enable deleting device-specific notification settings stored in account data
diff --git a/scripts-dev/complement.sh b/scripts-dev/complement.sh
index 7c48d8bccb..324269a2fd 100755
--- a/scripts-dev/complement.sh
+++ b/scripts-dev/complement.sh
@@ -215,6 +215,7 @@ if [[ -n "$WORKERS" ]]; then
   # time (the main problem is that we start 14 python processes for each test,
   # and complement likes to do two of them in parallel).
   export COMPLEMENT_SPAWN_HS_TIMEOUT_SECS=120
+  test_tags="$test_tags,faster_joins,msc2716"
 else
   export PASS_SYNAPSE_COMPLEMENT_USE_WORKERS=
   if [[ -n "$POSTGRES" ]]; then
diff --git a/synapse/app/generic_worker.py b/synapse/app/generic_worker.py
index 8108b1e98f..c406aba7cd 100644
--- a/synapse/app/generic_worker.py
+++ b/synapse/app/generic_worker.py
@@ -282,12 +282,12 @@ def start(config_options: List[str]) -> None:
         "synapse.app.user_dir",
     )

-    if config.experimental.faster_joins_enabled:
-        raise ConfigError(
-            "You have enabled the experimental `faster_joins` config option, but it is "
-            "not compatible with worker deployments yet. Please disable `faster_joins` "
-            "or run Synapse as a single process deployment instead."
-        )
+    #if config.experimental.faster_joins_enabled:
+    #    raise ConfigError(
+    #        "You have enabled the experimental `faster_joins` config option, but it is "
+    #        "not compatible with worker deployments yet. Please disable `faster_joins` "
+    #        "or run Synapse as a single process deployment instead."
+    #    )

     synapse.events.USE_FROZEN_DICTS = config.server.use_frozen_dicts
     synapse.util.caches.TRACK_MEMORY_USAGE = config.caches.track_memory_usage

and I ran the test using

WORKERS=1 COMPLEMENT_DIR=../complement scripts-dev/complement.sh -run TestPartialStateJoin/Room_stats* -p 1 | less

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for resolving the flake!

@H-Shay H-Shay marked this pull request as ready for review January 18, 2023 20:15
@H-Shay H-Shay requested a review from squahtx January 19, 2023 02:12
@H-Shay
Copy link
Contributor Author

H-Shay commented Jan 19, 2023

Related: matrix-org/synapse#12814, although I don't think this closes that issue as we still need to deal with this issue under a worker-ized setup.

@H-Shay H-Shay requested a review from a team January 19, 2023 02:15
Copy link
Contributor

@squahtx squahtx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Big thank you for getting the tests written so quickly.

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
@H-Shay H-Shay merged commit d524b86 into main Jan 19, 2023
@H-Shay H-Shay deleted the shay/test_stats branch January 19, 2023 17:54
@H-Shay
Copy link
Contributor Author

H-Shay commented Jan 19, 2023

Merged, thank you.

@squahtx squahtx linked an issue Jan 19, 2023 that may be closed by this pull request
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Faster joins: Update room stats once state re-sync completes
3 participants