-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
workers stop working after elevated traffic #2738
Description
Description
There appears to be nothing indicating a problem in the logs, however there's circumstantial evidence that when synapse receives higher than normal traffic it can cause the federation_sender to stop working (no activity), therefore not federating with remote servers. The federation_sender logs don't seem to have anything out of the ordinary - it just stops sending requests. The main synapse process complains about the events
stream falling behind, but doesn't seem to cause problems until 12 minutes later.
This has happened about 10 times in the past to t2bot.io, and each time the number of events being persisted was always elevated (double it's normal rate) before the federation_sender stopped working. For t2bot.io "normal" is defined as 2-3Hz. Each time the federation_sender has stopped the persisted events were going through at >6Hz (this latest being ~6-10Hz).
Here's the timeline for the problem (in UTC):
- 01:56:09 Synapse crosses the 6Hz persisted events line
- 03:07:28 The main synapse process started complaining that the
events
stream was falling behind - 03:10:03 Synapse falls below the 6Hz persisted events line
- 03:19:56 The federation_sender officially stopped working
- 04:27:22 The entire stack was restarted, restoring federation
During this time the only error spat out was (repeated every few seconds):
homeserver - 2017-12-17 03:13:04,014 - twisted - 131 - CRITICAL - -
Traceback (most recent call last):
File "/home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/home/matrix/.synapse/local/lib/python2.7/site-packages/synapse/replication/tcp/resource.py", line 164, in on_notifier_poke
updates, current_token = yield stream.get_updates()
File "/home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/home/matrix/.synapse/local/lib/python2.7/site-packages/synapse/replication/tcp/streams.py", line 169, in get_updates
updates, current_token = yield self.get_updates_since(self.last_token)
File "/home/matrix/.synapse/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "/home/matrix/.synapse/local/lib/python2.7/site-packages/synapse/replication/tcp/streams.py", line 200, in get_updates_since
raise Exception("stream %s has fallen behined" % (self.NAME))
Exception: stream current_state_deltas has fallen behined
Further, during this time incoming federation was unaffected. Synapse was still processing events and passing them along to appservices. Only outbound federation was affected.
More in-depth logs are available upon request.
Version information
- Homeserver: t2bot.io
- Version: 0.26.0-rc1
- Install method: pip
- Platform: container, ubuntu host.