-
Notifications
You must be signed in to change notification settings - Fork 5.1k
ratelimit: add support for failure_mode_allow configuration #4073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ratelimit: add support for failure_mode_allow configuration #4073
Conversation
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
@mattklein123 can you PTAL? |
Assigning to @junr03 for first pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for taking this @ramaraochavali!
@@ -48,8 +48,10 @@ enum ResponseFlag { | |||
RateLimited = 0x800, | |||
// Request was unauthorized by external authorization service. | |||
UnauthorizedExternalService = 0x1000, | |||
// Unable to call Ratelimiting service. | |||
RateLimitingServiceError = 0x1200, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: RateLimitServiceError
@@ -66,6 +68,7 @@ class FilterConfig { | |||
Stats::Scope& scope_; | |||
Runtime::Loader& runtime_; | |||
Upstream::ClusterManager& cm_; | |||
bool failure_mode_allow_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
@@ -57,6 +59,7 @@ class Config { | |||
std::vector<RateLimit::Descriptor> descriptors_; | |||
const InstanceStats stats_; | |||
Runtime::Loader& runtime_; | |||
bool failure_mode_allow_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
.value()); | ||
} | ||
|
||
TEST_F(HttpRateLimitFilterTest, ErrorResponseWithFailureAllowModeOff) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: FailureModeAllow for naming consistency
@@ -53,6 +55,8 @@ class RateLimitFilterTest : public testing::Test { | |||
Json::ObjectSharedPtr json_config = Json::Factory::loadFromString(json); | |||
envoy::config::filter::network::rate_limit::v2::RateLimit proto_config{}; | |||
Envoy::Config::FilterJson::translateTcpRateLimitFilter(*json_config, proto_config); | |||
// TODO(ramaraochavali): move the config to yaml and use a different config for failure_mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would go ahead and do this, and change these tests to use a SetUpTest(const std::string& yaml) setup function. Prevents the subclass addition here, as you point out in the TODO.
Json::ObjectSharedPtr json_config = Json::Factory::loadFromString(json); | ||
envoy::config::filter::http::rate_limit::v2::RateLimit proto_config{}; | ||
Config::FilterJson::translateHttpRateLimitFilter(*json_config, proto_config); | ||
// TODO(ramaraochavali): move filter_config_ to yaml and use a different config for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would go ahead and do this real quick.
@@ -157,10 +162,19 @@ class RatelimitIntegrationTest : public HttpIntegrationTest, | |||
|
|||
const uint64_t request_size_ = 1024; | |||
const uint64_t response_size_ = 512; | |||
bool failure_mode_ = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the integration test and the unit tests use the same varaible name failure_mode_
in opposite ways. In the unit tests we use it directly for the failure_mode_allow
config value, and here we use it logically opposite, i.e if failure_mode_
is true then failure_mode_allow
is false. Can we make it consistent between the tests?
@@ -31,7 +31,7 @@ void ResponseFlagUtils::appendString(std::string& result, const std::string& app | |||
const std::string ResponseFlagUtils::toShortString(const RequestInfo& request_info) { | |||
std::string result; | |||
|
|||
static_assert(ResponseFlag::LastFlag == 0x1000, "A flag has been added. Fix this code."); | |||
static_assert(ResponseFlag::LastFlag == 0x1200, "A flag has been added. Fix this code."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this, and all the other asserts related to the LastFlag, we not only need to fix the value. There is accompanying code that needs to be fixed. For example here we need to add the new flag to the mapping between the enum and the string.
Any suggestions appreciated on how to make this maintenance more obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Sorry..did not realize that. Will change
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
@junr03 addressed the comments. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm mod a few comments.
@@ -834,6 +834,7 @@ name: envoy.file_access_log | |||
RequestInfo::ResponseFlag::FaultInjected, | |||
RequestInfo::ResponseFlag::RateLimited, | |||
RequestInfo::ResponseFlag::UnauthorizedExternalService, | |||
RequestInfo::ResponseFlag::RateLimitServiceError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to take a look in the morning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great Catch. I see the problem. The hex value I gave for RateLimitServiceError
is incorrect so the intersectResponseFlags
is returning wrong value. I have corrected it.
@@ -162,13 +162,13 @@ class RatelimitIntegrationTest : public HttpIntegrationTest, | |||
|
|||
const uint64_t request_size_ = 1024; | |||
const uint64_t response_size_ = 512; | |||
bool failure_mode_ = false; | |||
bool failure_mode_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: sorry I don't think I was clear with my previous comment. Now that you changed the logic of this var, can we change the name to match the config name failure_mode_allow
. I think that adds clarity.
Signed-off-by: Rama <rama.rao@salesforce.com>
@junr03 addressed review comments. PTAL. |
@junr03 can you PTAL when you get time? |
lgtm, @ggreenway can you take a look? |
Signed-off-by: Rama <rama.rao@salesforce.com>
// not respond back. When it is set to true, Envoy will also allow traffic in case of | ||
// communication failure between rate limiting service and the proxy. | ||
// Defaults to true. | ||
google.protobuf.BoolValue failure_mode_allow = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's preferable to use bool instead of g.p.BoolValue. Can you change this to type bool and name failure_mode_deny (so that the default value is false)? Or is there a good reason for making this a tri-state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no other reason except to be consistent with external_authz proto naming. I will change.
// not respond back. When it is set to true, Envoy will also allow traffic in case of | ||
// communication failure between rate limiting service and the proxy. | ||
// Defaults to true. | ||
google.protobuf.BoolValue failure_mode_allow = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
@@ -147,6 +147,15 @@ void Filter::complete(RateLimit::LimitStatus status, Http::HeaderMapPtr&& header | |||
callbacks_->sendLocalReply(Http::Code::TooManyRequests, "", | |||
[this](Http::HeaderMap& headers) { addHeaders(headers); }); | |||
callbacks_->requestInfo().setResponseFlag(RequestInfo::ResponseFlag::RateLimited); | |||
} else if (status == RateLimit::LimitStatus::Error) { | |||
if (config_->failureModeAllow()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not change the method name in config because it seems more readable like this. LMK if you think otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good. A few more nits, but nothing major.
@@ -4,6 +4,7 @@ package envoy.config.filter.http.rate_limit.v2; | |||
option go_package = "v2"; | |||
|
|||
import "google/protobuf/duration.proto"; | |||
import "google/protobuf/wrappers.proto"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this isn't needed anymore
@@ -5,6 +5,7 @@ option go_package = "v2"; | |||
|
|||
import "envoy/api/v2/ratelimit/ratelimit.proto"; | |||
import "google/protobuf/duration.proto"; | |||
import "google/protobuf/wrappers.proto"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -16,6 +16,9 @@ apply to a request. Each configuration results in a descriptor being sent to the | |||
If the rate limit service is called, and the response for any of the descriptors is over limit, a | |||
429 response is returned. | |||
|
|||
If there is an error in calling rate limit service or rate limit service returns an error and :ref:`failure_mode_allow <envoy_api_msg_config.filter.http.rate_limit.v2.RateLimit>` is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failure_mode_deny
@@ -108,6 +111,8 @@ The buffer filter outputs statistics in the *cluster.<route target cluster>.rate | |||
ok, Counter, Total under limit responses from the rate limit service | |||
error, Counter, Total errors contacting the rate limit service | |||
over_limit, Counter, total over limit responses from the rate limit service | |||
failure_mode_allowed, Counter, "Total requests that were error(s) but were allowed through because | |||
of :ref:`failure_mode_allow <envoy_api_msg_config.filter.http.rate_limit.v2.RateLimit>` set to true." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failure_mode_deny
@@ -25,6 +25,8 @@ following statistics: | |||
ok, Counter, Total under limit responses from the rate limit service | |||
cx_closed, Counter, Total connections closed due to an over limit response from the rate limit service | |||
active, Gauge, Total active requests to the rate limit service | |||
failure_mode_allowed, Counter, "Total requests that were error(s) but were allowed through because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failure_mode_deny: I'm fine with the stat being named either way, but the doc link needs to be updated.
@@ -147,6 +147,15 @@ void Filter::complete(RateLimit::LimitStatus status, Http::HeaderMapPtr&& header | |||
callbacks_->sendLocalReply(Http::Code::TooManyRequests, "", | |||
[this](Http::HeaderMap& headers) { addHeaders(headers); }); | |||
callbacks_->requestInfo().setResponseFlag(RequestInfo::ResponseFlag::RateLimited); | |||
} else if (status == RateLimit::LimitStatus::Error) { | |||
if (config_->failureModeAllow()) { | |||
cluster_->statsScope().counter("ratelimit.failure_mode_allowed").inc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this new counter at all? We already have a counter for ratelimit service failures, and presumably the user knows which way they have this configured (failure_mode_deny value). In one case this counter will always be zero, in the other it will be equal to ratelimit.error
. So is there value in this new counter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would help in general, while looking at stats user do not have to check back on what his configuration is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'm fine with that.
Signed-off-by: Rama <rama.rao@salesforce.com>
…nvoyproxy#4073)" This reverts commit ac0bd74.
@ramaraochavali @ggreenway, @danielhochman spotted a problem in this patch here https://github.com/envoyproxy/envoy/pull/4073/files#diff-12a885082b482d2c6be37666e04013b0R153. We noticed that in the new flow, we continueDecoding without checking initiating_call_. This is causing crashes at Lyft, so we'd like to revert. |
…nvoyproxy#4073)" This reverts commit ac0bd74.
For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md
Description: Adds supports for specifying
failure_mode_allow
configuration in rate limit filter config. It is defaulted totrue
to be sync with current behaviour.Risk Level: Low
Testing: Automated tests
Docs Changes: Updated
Release Notes: Updated
Fixes #4023