Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Make sure synapse_rate_limit_reject_affected_hosts does what it says it does #13670

@MadLittleMods

Description

@MadLittleMods

Spawning from #13541 (comment)


The synapse_rate_limit_reject_affected_hosts gauge is always evaluating to 0. The raw data in Prometheus also shows 0 for reference.

https://grafana.matrix.org/d/dYoRgTgVz/messages-timing?orgId=1&from=1661855196368&to=1661876796368&viewPanel=220

synapse_rate_limit_reject_affected_hosts all 0 in the graph

Even though we see individual requests being rejected (synapse_rate_limit_reject_total) which should mean at least 1 host,

synapse_rate_limit_reject_total with some activity in the graph

But this could be a mismatch in how the guages were being reported because we were accidentally registering them twice, #13641


Now that we fixed the duplicate metric registering issue in #13649 and the fix was put on matrix.org this morning, we're seeing both at 0 now. This could mean that the previous rejections we were seeing were all from the UsernameAvailabilityRestServlet which we are no longer tracking. And we're not rejecting any requests in the federation servlets.

It is a bit suspicious though.

synapse_rate_limit_reject_affected_hosts and synapse_rate_limit_reject_total being 0 in the graph after matrix.org deploy

How can we know if it's right?

In order to confirm that synapse_rate_limit_reject_affected_hosts is working, it would be nice to see a non-zero value.

The reject_limit is 50 which I think means there has to be more than 50 requests within the 1 second federation_rc_window_size to start rejecting.

We do see the rate of slept requests go above 70 sometimes which I would expect to trigger this 🤔

https://grafana.matrix.org/d/dYoRgTgVz/messages-timing?orgId=1&from=1661873701156&to=1661877301156&viewPanel=223

synapse_rate_limit_sleep_total showing spikes above the 50 reject_limit

Dev notes

The synapse_rate_limit_reject_affected_hosts metric was originally added in #13541 and updated in #13649

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-FederationA-Metricsmetrics, measures, stuff we put in PrometheusO-UncommonMost users are unlikely to come across this or unexpected workflowS-MinorBlocks non-critical functionality, workarounds exist.T-TaskRefactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions