Skip to content

Conversation

ramaraochavali
Copy link
Contributor

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Description: Adds supports for specifying failure_mode_allow configuration in rate limit filter config. It is defaulted to true to be sync with current behaviour.
Risk Level: Low
Testing: Automated tests
Docs Changes: Updated
Release Notes: Updated
Fixes #4023

Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
@ramaraochavali
Copy link
Contributor Author

@mattklein123 can you PTAL?

@mattklein123
Copy link
Member

Assigning to @junr03 for first pass.

junr03
junr03 previously requested changes Aug 10, 2018
Copy link
Member

@junr03 junr03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for taking this @ramaraochavali!

@@ -48,8 +48,10 @@ enum ResponseFlag {
RateLimited = 0x800,
// Request was unauthorized by external authorization service.
UnauthorizedExternalService = 0x1000,
// Unable to call Ratelimiting service.
RateLimitingServiceError = 0x1200,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: RateLimitServiceError

@@ -66,6 +68,7 @@ class FilterConfig {
Stats::Scope& scope_;
Runtime::Loader& runtime_;
Upstream::ClusterManager& cm_;
bool failure_mode_allow_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

@@ -57,6 +59,7 @@ class Config {
std::vector<RateLimit::Descriptor> descriptors_;
const InstanceStats stats_;
Runtime::Loader& runtime_;
bool failure_mode_allow_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

.value());
}

TEST_F(HttpRateLimitFilterTest, ErrorResponseWithFailureAllowModeOff) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: FailureModeAllow for naming consistency

@@ -53,6 +55,8 @@ class RateLimitFilterTest : public testing::Test {
Json::ObjectSharedPtr json_config = Json::Factory::loadFromString(json);
envoy::config::filter::network::rate_limit::v2::RateLimit proto_config{};
Envoy::Config::FilterJson::translateTcpRateLimitFilter(*json_config, proto_config);
// TODO(ramaraochavali): move the config to yaml and use a different config for failure_mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go ahead and do this, and change these tests to use a SetUpTest(const std::string& yaml) setup function. Prevents the subclass addition here, as you point out in the TODO.

Json::ObjectSharedPtr json_config = Json::Factory::loadFromString(json);
envoy::config::filter::http::rate_limit::v2::RateLimit proto_config{};
Config::FilterJson::translateHttpRateLimitFilter(*json_config, proto_config);
// TODO(ramaraochavali): move filter_config_ to yaml and use a different config for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go ahead and do this real quick.

@@ -157,10 +162,19 @@ class RatelimitIntegrationTest : public HttpIntegrationTest,

const uint64_t request_size_ = 1024;
const uint64_t response_size_ = 512;
bool failure_mode_ = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the integration test and the unit tests use the same varaible name failure_mode_ in opposite ways. In the unit tests we use it directly for the failure_mode_allow config value, and here we use it logically opposite, i.e if failure_mode_ is true then failure_mode_allow is false. Can we make it consistent between the tests?

@@ -31,7 +31,7 @@ void ResponseFlagUtils::appendString(std::string& result, const std::string& app
const std::string ResponseFlagUtils::toShortString(const RequestInfo& request_info) {
std::string result;

static_assert(ResponseFlag::LastFlag == 0x1000, "A flag has been added. Fix this code.");
static_assert(ResponseFlag::LastFlag == 0x1200, "A flag has been added. Fix this code.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this, and all the other asserts related to the LastFlag, we not only need to fix the value. There is accompanying code that needs to be fixed. For example here we need to add the new flag to the mapping between the enum and the string.

Any suggestions appreciated on how to make this maintenance more obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Sorry..did not realize that. Will change

Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
@ramaraochavali
Copy link
Contributor Author

@junr03 addressed the comments. PTAL.

Copy link
Member

@junr03 junr03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm mod a few comments.

@@ -834,6 +834,7 @@ name: envoy.file_access_log
RequestInfo::ResponseFlag::FaultInjected,
RequestInfo::ResponseFlag::RateLimited,
RequestInfo::ResponseFlag::UnauthorizedExternalService,
RequestInfo::ResponseFlag::RateLimitServiceError,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. This test should fail due to unmatched expectation here, because the new flag was not added to the filter config here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to take a look in the morning.

Copy link
Contributor Author

@ramaraochavali ramaraochavali Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Catch. I see the problem. The hex value I gave for RateLimitServiceError is incorrect so the intersectResponseFlags is returning wrong value. I have corrected it.

@@ -162,13 +162,13 @@ class RatelimitIntegrationTest : public HttpIntegrationTest,

const uint64_t request_size_ = 1024;
const uint64_t response_size_ = 512;
bool failure_mode_ = false;
bool failure_mode_ = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sorry I don't think I was clear with my previous comment. Now that you changed the logic of this var, can we change the name to match the config name failure_mode_allow. I think that adds clarity.

Signed-off-by: Rama <rama.rao@salesforce.com>
@ramaraochavali
Copy link
Contributor Author

@junr03 addressed review comments. PTAL.

@ramaraochavali
Copy link
Contributor Author

@junr03 can you PTAL when you get time?

@junr03
Copy link
Member

junr03 commented Aug 17, 2018

lgtm, @ggreenway can you take a look?

Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
// not respond back. When it is set to true, Envoy will also allow traffic in case of
// communication failure between rate limiting service and the proxy.
// Defaults to true.
google.protobuf.BoolValue failure_mode_allow = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's preferable to use bool instead of g.p.BoolValue. Can you change this to type bool and name failure_mode_deny (so that the default value is false)? Or is there a good reason for making this a tri-state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no other reason except to be consistent with external_authz proto naming. I will change.

// not respond back. When it is set to true, Envoy will also allow traffic in case of
// communication failure between rate limiting service and the proxy.
// Defaults to true.
google.protobuf.BoolValue failure_mode_allow = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
@@ -147,6 +147,15 @@ void Filter::complete(RateLimit::LimitStatus status, Http::HeaderMapPtr&& header
callbacks_->sendLocalReply(Http::Code::TooManyRequests, "",
[this](Http::HeaderMap& headers) { addHeaders(headers); });
callbacks_->requestInfo().setResponseFlag(RequestInfo::ResponseFlag::RateLimited);
} else if (status == RateLimit::LimitStatus::Error) {
if (config_->failureModeAllow()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not change the method name in config because it seems more readable like this. LMK if you think otherwise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine as is

Copy link
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good. A few more nits, but nothing major.

@@ -4,6 +4,7 @@ package envoy.config.filter.http.rate_limit.v2;
option go_package = "v2";

import "google/protobuf/duration.proto";
import "google/protobuf/wrappers.proto";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this isn't needed anymore

@@ -5,6 +5,7 @@ option go_package = "v2";

import "envoy/api/v2/ratelimit/ratelimit.proto";
import "google/protobuf/duration.proto";
import "google/protobuf/wrappers.proto";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -16,6 +16,9 @@ apply to a request. Each configuration results in a descriptor being sent to the
If the rate limit service is called, and the response for any of the descriptors is over limit, a
429 response is returned.

If there is an error in calling rate limit service or rate limit service returns an error and :ref:`failure_mode_allow <envoy_api_msg_config.filter.http.rate_limit.v2.RateLimit>` is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failure_mode_deny

@@ -108,6 +111,8 @@ The buffer filter outputs statistics in the *cluster.<route target cluster>.rate
ok, Counter, Total under limit responses from the rate limit service
error, Counter, Total errors contacting the rate limit service
over_limit, Counter, total over limit responses from the rate limit service
failure_mode_allowed, Counter, "Total requests that were error(s) but were allowed through because
of :ref:`failure_mode_allow <envoy_api_msg_config.filter.http.rate_limit.v2.RateLimit>` set to true."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failure_mode_deny

@@ -25,6 +25,8 @@ following statistics:
ok, Counter, Total under limit responses from the rate limit service
cx_closed, Counter, Total connections closed due to an over limit response from the rate limit service
active, Gauge, Total active requests to the rate limit service
failure_mode_allowed, Counter, "Total requests that were error(s) but were allowed through because
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failure_mode_deny: I'm fine with the stat being named either way, but the doc link needs to be updated.

@@ -147,6 +147,15 @@ void Filter::complete(RateLimit::LimitStatus status, Http::HeaderMapPtr&& header
callbacks_->sendLocalReply(Http::Code::TooManyRequests, "",
[this](Http::HeaderMap& headers) { addHeaders(headers); });
callbacks_->requestInfo().setResponseFlag(RequestInfo::ResponseFlag::RateLimited);
} else if (status == RateLimit::LimitStatus::Error) {
if (config_->failureModeAllow()) {
cluster_->statsScope().counter("ratelimit.failure_mode_allowed").inc();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this new counter at all? We already have a counter for ratelimit service failures, and presumably the user knows which way they have this configured (failure_mode_deny value). In one case this counter will always be zero, in the other it will be equal to ratelimit.error. So is there value in this new counter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would help in general, while looking at stats user do not have to check back on what his configuration is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm fine with that.

Signed-off-by: Rama <rama.rao@salesforce.com>
@ggreenway ggreenway dismissed junr03’s stale review August 21, 2018 21:19

junr03 commented "lgtm" above

@ggreenway ggreenway merged commit ac0bd74 into envoyproxy:master Aug 21, 2018
@ramaraochavali ramaraochavali deleted the feature/failure_mode_rls branch August 22, 2018 02:58
danielhochman added a commit to danielhochman/envoy that referenced this pull request Aug 28, 2018
@junr03
Copy link
Member

junr03 commented Aug 28, 2018

@ramaraochavali @ggreenway, @danielhochman spotted a problem in this patch here https://github.com/envoyproxy/envoy/pull/4073/files#diff-12a885082b482d2c6be37666e04013b0R153. We noticed that in the new flow, we continueDecoding without checking initiating_call_. This is causing crashes at Lyft, so we'd like to revert.

@ramaraochavali
Copy link
Contributor Author

@junr03 sorry for that. I have just submitted a PR to fix that #4287. If that does not address your problem, feel free to revert it. I will get it back in when I get time.

junr03 added a commit that referenced this pull request Aug 29, 2018
This reverts commit ac0bd74. But leaves the API changes as 'not implemented' in order to not scramble the proto field.

#4073 had a bug. The cause has been identified, and a fix PR is forthcoming. However, in the meantime, we want to leave master clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants