ratelimit: add support for failure_mode_allow configuration #4073

ramaraochavali · 2018-08-07T08:13:45Z

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Description: Adds supports for specifying failure_mode_allow configuration in rate limit filter config. It is defaulted to true to be sync with current behaviour.
Risk Level: Low
Testing: Automated tests
Docs Changes: Updated
Release Notes: Updated
Fixes #4023

Signed-off-by: Rama <rama.rao@salesforce.com>

ramaraochavali · 2018-08-07T10:47:42Z

@mattklein123 can you PTAL?

mattklein123 · 2018-08-07T20:11:34Z

Assigning to @junr03 for first pass.

junr03

thanks for taking this @ramaraochavali!

junr03 · 2018-08-10T16:31:04Z

include/envoy/request_info/request_info.h

@@ -48,8 +48,10 @@ enum ResponseFlag {
  RateLimited = 0x800,
  // Request was unauthorized by external authorization service.
  UnauthorizedExternalService = 0x1000,
+  // Unable to call Ratelimiting service.
+  RateLimitingServiceError = 0x1200,


nit: RateLimitServiceError

junr03 · 2018-08-10T16:32:59Z

source/extensions/filters/http/ratelimit/ratelimit.h

@@ -66,6 +68,7 @@ class FilterConfig {
  Stats::Scope& scope_;
  Runtime::Loader& runtime_;
  Upstream::ClusterManager& cm_;
+  bool failure_mode_allow_;


junr03 · 2018-08-10T16:34:35Z

source/extensions/filters/network/ratelimit/ratelimit.h

@@ -57,6 +59,7 @@ class Config {
  std::vector<RateLimit::Descriptor> descriptors_;
  const InstanceStats stats_;
  Runtime::Loader& runtime_;
+  bool failure_mode_allow_;


junr03 · 2018-08-10T16:36:22Z

test/extensions/filters/http/ratelimit/ratelimit_test.cc

+                    .value());
+}
+
+TEST_F(HttpRateLimitFilterTest, ErrorResponseWithFailureAllowModeOff) {


nit: FailureModeAllow for naming consistency

junr03 · 2018-08-10T16:41:55Z

test/extensions/filters/network/ratelimit/ratelimit_test.cc

@@ -53,6 +55,8 @@ class RateLimitFilterTest : public testing::Test {
    Json::ObjectSharedPtr json_config = Json::Factory::loadFromString(json);
    envoy::config::filter::network::rate_limit::v2::RateLimit proto_config{};
    Envoy::Config::FilterJson::translateTcpRateLimitFilter(*json_config, proto_config);
+    // TODO(ramaraochavali): move the config to yaml and use a different config for failure_mode.


I would go ahead and do this, and change these tests to use a SetUpTest(const std::string& yaml) setup function. Prevents the subclass addition here, as you point out in the TODO.

junr03 · 2018-08-10T16:43:23Z

test/extensions/filters/http/ratelimit/ratelimit_test.cc

    Json::ObjectSharedPtr json_config = Json::Factory::loadFromString(json);
    envoy::config::filter::http::rate_limit::v2::RateLimit proto_config{};
    Config::FilterJson::translateHttpRateLimitFilter(*json_config, proto_config);
+    // TODO(ramaraochavali): move filter_config_ to yaml and use a different config for


I would go ahead and do this real quick.

junr03 · 2018-08-10T16:48:42Z

test/integration/ratelimit_integration_test.cc

@@ -157,10 +162,19 @@ class RatelimitIntegrationTest : public HttpIntegrationTest,

  const uint64_t request_size_ = 1024;
  const uint64_t response_size_ = 512;
+  bool failure_mode_ = false;


the integration test and the unit tests use the same varaible name failure_mode_ in opposite ways. In the unit tests we use it directly for the failure_mode_allow config value, and here we use it logically opposite, i.e if failure_mode_ is true then failure_mode_allow is false. Can we make it consistent between the tests?

junr03 · 2018-08-10T16:54:21Z

source/common/request_info/utility.cc

@@ -31,7 +31,7 @@ void ResponseFlagUtils::appendString(std::string& result, const std::string& app
 const std::string ResponseFlagUtils::toShortString(const RequestInfo& request_info) {
  std::string result;

-  static_assert(ResponseFlag::LastFlag == 0x1000, "A flag has been added. Fix this code.");
+  static_assert(ResponseFlag::LastFlag == 0x1200, "A flag has been added. Fix this code.");


For this, and all the other asserts related to the LastFlag, we not only need to fix the value. There is accompanying code that needs to be fixed. For example here we need to add the new flag to the mapping between the enum and the string.

Any suggestions appreciated on how to make this maintenance more obvious.

Oh. Sorry..did not realize that. Will change

Signed-off-by: Rama <rama.rao@salesforce.com>

ramaraochavali · 2018-08-12T13:36:13Z

@junr03 addressed the comments. PTAL.

junr03

lgtm mod a few comments.

junr03 · 2018-08-13T04:05:31Z

test/common/access_log/access_log_impl_test.cc

@@ -834,6 +834,7 @@ name: envoy.file_access_log
      RequestInfo::ResponseFlag::FaultInjected,
      RequestInfo::ResponseFlag::RateLimited,
      RequestInfo::ResponseFlag::UnauthorizedExternalService,
+      RequestInfo::ResponseFlag::RateLimitServiceError,


Interesting. This test should fail due to unmatched expectation here, because the new flag was not added to the filter config here

I'm going to take a look in the morning.

Great Catch. I see the problem. The hex value I gave for RateLimitServiceError is incorrect so the intersectResponseFlags is returning wrong value. I have corrected it.

junr03 · 2018-08-13T04:06:31Z

test/integration/ratelimit_integration_test.cc

@@ -162,13 +162,13 @@ class RatelimitIntegrationTest : public HttpIntegrationTest,

  const uint64_t request_size_ = 1024;
  const uint64_t response_size_ = 512;
-  bool failure_mode_ = false;
+  bool failure_mode_ = true;


nit: sorry I don't think I was clear with my previous comment. Now that you changed the logic of this var, can we change the name to match the config name failure_mode_allow. I think that adds clarity.

Signed-off-by: Rama <rama.rao@salesforce.com>

ramaraochavali · 2018-08-13T08:41:31Z

@junr03 addressed review comments. PTAL.

ramaraochavali · 2018-08-17T06:13:52Z

@junr03 can you PTAL when you get time?

junr03 · 2018-08-17T16:00:26Z

lgtm, @ggreenway can you take a look?

Signed-off-by: Rama <rama.rao@salesforce.com>

ggreenway · 2018-08-20T23:57:39Z

api/envoy/config/filter/http/rate_limit/v2/rate_limit.proto

+  // not respond back. When it is set to true, Envoy will also allow traffic in case of
+  // communication failure between rate limiting service and the proxy.
+  // Defaults to true.
+  google.protobuf.BoolValue failure_mode_allow = 5;


It's preferable to use bool instead of g.p.BoolValue. Can you change this to type bool and name failure_mode_deny (so that the default value is false)? Or is there a good reason for making this a tri-state?

There is no other reason except to be consistent with external_authz proto naming. I will change.

ggreenway · 2018-08-20T23:57:58Z

api/envoy/config/filter/network/rate_limit/v2/rate_limit.proto

+  // not respond back. When it is set to true, Envoy will also allow traffic in case of
+  // communication failure between rate limiting service and the proxy.
+  // Defaults to true.
+  google.protobuf.BoolValue failure_mode_allow = 5;


Signed-off-by: Rama <rama.rao@salesforce.com>

ramaraochavali · 2018-08-21T08:52:49Z

source/extensions/filters/http/ratelimit/ratelimit.cc

@@ -147,6 +147,15 @@ void Filter::complete(RateLimit::LimitStatus status, Http::HeaderMapPtr&& header
    callbacks_->sendLocalReply(Http::Code::TooManyRequests, "",
                               [this](Http::HeaderMap& headers) { addHeaders(headers); });
    callbacks_->requestInfo().setResponseFlag(RequestInfo::ResponseFlag::RateLimited);
+  } else if (status == RateLimit::LimitStatus::Error) {
+    if (config_->failureModeAllow()) {


I did not change the method name in config because it seems more readable like this. LMK if you think otherwise

This is fine as is

ggreenway

This is looking pretty good. A few more nits, but nothing major.

ggreenway · 2018-08-21T15:49:19Z

api/envoy/config/filter/http/rate_limit/v2/rate_limit.proto

@@ -4,6 +4,7 @@ package envoy.config.filter.http.rate_limit.v2;
 option go_package = "v2";

 import "google/protobuf/duration.proto";
+import "google/protobuf/wrappers.proto";


nit: this isn't needed anymore

ggreenway · 2018-08-21T15:49:33Z

api/envoy/config/filter/network/rate_limit/v2/rate_limit.proto

@@ -5,6 +5,7 @@ option go_package = "v2";

 import "envoy/api/v2/ratelimit/ratelimit.proto";
 import "google/protobuf/duration.proto";
+import "google/protobuf/wrappers.proto";


ggreenway · 2018-08-21T15:51:42Z

docs/root/configuration/http_filters/rate_limit_filter.rst

@@ -16,6 +16,9 @@ apply to a request. Each configuration results in a descriptor being sent to the
 If the rate limit service is called, and the response for any of the descriptors is over limit, a
 429 response is returned.

+If there is an error in calling rate limit service or rate limit service returns an error and :ref:`failure_mode_allow <envoy_api_msg_config.filter.http.rate_limit.v2.RateLimit>` is 


failure_mode_deny

ggreenway · 2018-08-21T15:52:20Z

docs/root/configuration/http_filters/rate_limit_filter.rst

@@ -108,6 +111,8 @@ The buffer filter outputs statistics in the *cluster.<route target cluster>.rate
  ok, Counter, Total under limit responses from the rate limit service
  error, Counter, Total errors contacting the rate limit service
  over_limit, Counter, total over limit responses from the rate limit service
+  failure_mode_allowed, Counter, "Total requests that were error(s) but were allowed through because
+  of :ref:`failure_mode_allow <envoy_api_msg_config.filter.http.rate_limit.v2.RateLimit>` set to true."


failure_mode_deny

ggreenway · 2018-08-21T15:53:07Z

docs/root/configuration/network_filters/rate_limit_filter.rst

@@ -25,6 +25,8 @@ following statistics:
  ok, Counter, Total under limit responses from the rate limit service
  cx_closed, Counter, Total connections closed due to an over limit response from the rate limit service
  active, Gauge, Total active requests to the rate limit service
+  failure_mode_allowed, Counter, "Total requests that were error(s) but were allowed through because


failure_mode_deny: I'm fine with the stat being named either way, but the doc link needs to be updated.

ggreenway · 2018-08-21T16:04:18Z

source/extensions/filters/http/ratelimit/ratelimit.cc

@@ -147,6 +147,15 @@ void Filter::complete(RateLimit::LimitStatus status, Http::HeaderMapPtr&& header
    callbacks_->sendLocalReply(Http::Code::TooManyRequests, "",
                               [this](Http::HeaderMap& headers) { addHeaders(headers); });
    callbacks_->requestInfo().setResponseFlag(RequestInfo::ResponseFlag::RateLimited);
+  } else if (status == RateLimit::LimitStatus::Error) {
+    if (config_->failureModeAllow()) {
+      cluster_->statsScope().counter("ratelimit.failure_mode_allowed").inc();


Do we need this new counter at all? We already have a counter for ratelimit service failures, and presumably the user knows which way they have this configured (failure_mode_deny value). In one case this counter will always be zero, in the other it will be equal to ratelimit.error. So is there value in this new counter?

I think this would help in general, while looking at stats user do not have to check back on what his configuration is.

Ok, I'm fine with that.

Signed-off-by: Rama <rama.rao@salesforce.com>

junr03 commented "lgtm" above

…nvoyproxy#4073)" This reverts commit ac0bd74.

junr03 · 2018-08-28T21:51:28Z

@ramaraochavali @ggreenway, @danielhochman spotted a problem in this patch here https://github.com/envoyproxy/envoy/pull/4073/files#diff-12a885082b482d2c6be37666e04013b0R153. We noticed that in the new flow, we continueDecoding without checking initiating_call_. This is causing crashes at Lyft, so we'd like to revert.

…nvoyproxy#4073)" This reverts commit ac0bd74.

ramaraochavali · 2018-08-29T04:20:03Z

@junr03 sorry for that. I have just submitted a PR to fix that #4287. If that does not address your problem, feel free to revert it. I will get it back in when I get time.

This reverts commit ac0bd74. But leaves the API changes as 'not implemented' in order to not scramble the proto field. #4073 had a bug. The cause has been identified, and a fix PR is forthcoming. However, in the meantime, we want to leave master clean.

ramaraochavali added 5 commits August 7, 2018 13:35

support failure_mode_allow in ratelimit

f0a6a88

Signed-off-by: Rama <rama.rao@salesforce.com>

Merge branch 'master' into feature/failure_mode_rls

84b081a

Signed-off-by: Rama <rama.rao@salesforce.com>

corrected version history

88e49e6

Signed-off-by: Rama <rama.rao@salesforce.com>

revert some changes

83e75ec

Signed-off-by: Rama <rama.rao@salesforce.com>

fix test compilation issues

ef281ad

Signed-off-by: Rama <rama.rao@salesforce.com>

mattklein123 assigned junr03 Aug 7, 2018

junr03 previously requested changes Aug 10, 2018

View reviewed changes

ramaraochavali added 3 commits August 12, 2018 14:47

addressed review feedback

91439ea

Signed-off-by: Rama <rama.rao@salesforce.com>

Merge branch 'master' into feature/failure_mode_rls

2934cc4

Signed-off-by: Rama <rama.rao@salesforce.com>

fix test cases

37b6913

Signed-off-by: Rama <rama.rao@salesforce.com>

junr03 reviewed Aug 13, 2018

View reviewed changes

fixed the test case

a6ca282

Signed-off-by: Rama <rama.rao@salesforce.com>

junr03 assigned ggreenway Aug 17, 2018

ramaraochavali added 2 commits August 18, 2018 16:55

Merge branch 'master' into feature/failure_mode_rls

6296e00

Signed-off-by: Rama <rama.rao@salesforce.com>

fix tests

3289db0

Signed-off-by: Rama <rama.rao@salesforce.com>

ggreenway requested changes Aug 21, 2018

View reviewed changes

ramaraochavali added 2 commits August 21, 2018 12:36

change failure_mode_allow to failure_mode_deny

cf357e2

Signed-off-by: Rama <rama.rao@salesforce.com>

change integration test to use failure_mode_deny for better clarity

d561b19

Signed-off-by: Rama <rama.rao@salesforce.com>

ramaraochavali commented Aug 21, 2018

View reviewed changes

ggreenway requested changes Aug 21, 2018

View reviewed changes

updated docs

4404df2

Signed-off-by: Rama <rama.rao@salesforce.com>

ggreenway approved these changes Aug 21, 2018

View reviewed changes

ggreenway merged commit ac0bd74 into envoyproxy:master Aug 21, 2018

ramaraochavali deleted the feature/failure_mode_rls branch August 22, 2018 02:58

danielhochman added a commit to danielhochman/envoy that referenced this pull request Aug 28, 2018

Revert "ratelimit: add support for failure_mode_allow configuration (e…

0abbd31

…nvoyproxy#4073)" This reverts commit ac0bd74.

danielhochman added a commit to danielhochman/envoy that referenced this pull request Aug 28, 2018

Revert "ratelimit: add support for failure_mode_allow configuration (e…

3ee0fb5

…nvoyproxy#4073)" This reverts commit ac0bd74.

ramaraochavali mentioned this pull request Aug 29, 2018

ratelimit: check for initiating call before continue decoding #4287

Closed

junr03 mentioned this pull request Aug 29, 2018

Revert ac0bd74f6f9716e3a44d1412f795317c30ca770a #4295

Merged

ratelimit: add support for failure_mode_allow configuration #4073

ratelimit: add support for failure_mode_allow configuration #4073

Uh oh!

Conversation

ramaraochavali commented Aug 7, 2018

Uh oh!

ramaraochavali commented Aug 7, 2018

Uh oh!

mattklein123 commented Aug 7, 2018

Uh oh!

junr03 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ramaraochavali commented Aug 12, 2018

Uh oh!

junr03 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ramaraochavali Aug 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ramaraochavali commented Aug 13, 2018

Uh oh!

ramaraochavali commented Aug 17, 2018

Uh oh!

junr03 commented Aug 17, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggreenway left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ramaraochavali Aug 13, 2018 •

edited

Loading