-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Admin visibility into federation status #7982
Description
As admin of a large synapse server (3000 users) I frequently end up in situations where users are reporting issues sending or receiving messages from other homeservers. (Often, this is matrix.org, but sometimes it's other large homeservers such as kde.org or pine64.org). I currently have little visibility into what could be causing these issues. Is it an issue with my homeserver, or the remote? If it's on our end, where should I be looking for problems?
In particular, there are a few key questions I don't currently have a way to answer:
- Is my homeserver returning errors to remote homeservers? (If so, which homeservers and in what rooms?)
- Is a particular remote homeserver returning errors / even online (from the perspective of my homeserver)?
- When a message arrives late, what caused the delay?
I would love if this information was exposed in some kind of dashboard, but failing that an addition to the admin API would be acceptable. (Note though that I don't really have any insight into what's currently included in the admin API or how I would access it). Looking around in the docs folder of the repo, I've found an unfinished looking document on room statistics and some information on prometheus metrics, which is unhelpful to me as I don't use prometheus (maybe I should, but it's not mentioned anywhere in the README or setup instructions).