Making //test/integration:hds_integration_test test faster and more robust #4164

markatou · 2018-08-15T18:43:46Z

Description:
Follow up on #4149.

I increased any unused timers to 100 seconds. Additionally, instead of waiting on a timer to go off, all but one tests wait on cluster stats to change.
I decreased the timer wait times, reducing the test runtime from 25 seconds to 5.

Exceptions:

While TCP health checking, if an endpoint responds with a wrong phrase, the cluster failure stats do not change. So, the one test that examines this case still depends on timers.
There are 2 tests that test hosts timing out. I set those timeouts to 0.1 seconds (vs 100). However, both tests still wait on the cluster stats to change.

Risk Level:
Low

Testing:
I run the test locally, and it passed 3000/3000 times.

This work is for #1310.

Signed-off-by: Lilika Markatou <lilika@google.com>

lizan · 2018-08-15T23:50:47Z

re: reducing timeouts, can you run test locally with TSAN and ASAN? These are much slower than regular debug build, so make sure your reduced timeout doesn't fail them.

mattklein123

Thanks, this is exactly what I was asking for. Will defer to @htuch and @lizan for further review.

mattklein123 · 2018-08-16T14:50:26Z

test/integration/hds_integration_test.cc

@@ -118,7 +118,7 @@ class HdsIntegrationTest : public HttpIntegrationTest,
  // one HTTP health_check
  envoy::service::discovery::v2::HealthCheckSpecifier makeHttpHealthCheckSpecifier() {
    envoy::service::discovery::v2::HealthCheckSpecifier server_health_check_specifier_;
-    server_health_check_specifier_.mutable_interval()->set_seconds(1);
+    server_health_check_specifier_.mutable_interval()->set_nanos(100000000);


Why not still use seconds here for readability? Same elsewhere? Or perhaps a constant such as MaxTimeout to make it clear that we don't care about the timeout?

This interval specifies how often the HdsDelegate sends a message with health checking information back to the management server. Before these changes, I was using time to make sure that the cluster updated before a response was sent. Now, I check the stats to make sure that the cluster processed the change, before I look at what the HdsDelegate sends to the server.

In order to make the interval less than a second, I need to use nanos.

ah ok. Alright, some type of constant or comment might still be nice here since it's hard to read how much time this is.

Signed-off-by: Lilika Markatou <lilika@google.com>

markatou · 2018-08-16T19:14:44Z

I run the test locally with TSAN and ASAN and it passed 1000/1000 times for each one.

Signed-off-by: Lilika Markatou <lilika@google.com>

htuch

Nice to fix the HDS flakes here. I agree with generally shortening the timeouts for timeout tests to force immediate timeout behavior, and lengthening any timeout that doesn't slow down the test. Not sure about some of the wait for cluster stats to change steps though.

htuch · 2018-08-16T20:54:27Z

test/integration/hds_integration_test.cc

@@ -201,6 +201,7 @@ class HdsIntegrationTest : public HttpIntegrationTest,
  FakeHttpConnectionPtr host2_fake_connection_;
  FakeRawConnectionPtr host_fake_raw_connection_;

+  int MaxTimeout = 100;


Nit: static constexpr int

htuch · 2018-08-16T20:57:45Z

test/integration/hds_integration_test.cc

@@ -396,6 +412,9 @@ TEST_P(HdsIntegrationTest, SingleEndpointHealthyTcp) {
  RELEASE_ASSERT(result, result.message());
  host_upstream_->set_allow_unexpected_disconnects(true);

+  // Make sure the cluster processed the endpoint's response
+  test_server_->waitForCounterGe("cluster.anna.health_check.success", 1);


It's a bit unclear to me what the win with some of the waits is. We will need to wait for the gRPC response from Envoy to the HDS server below, and those messages are sent independently of what we're doing here wait wise.

Signed-off-by: markatou <lilika@google.com>

htuch

Great, looks robust now. Just two last nits if you have time tomorrow.

htuch · 2018-08-21T22:00:26Z

test/integration/hds_integration_test.cc

  ASSERT_TRUE(hds_stream_->waitForGrpcMessage(*dispatcher_, response_));
+  while (!(checkEndpointHealthResponse(response_.endpoint_health_response().endpoints_health(0),
+                                       envoy::api::v2::core::HealthStatus::UNHEALTHY,
+                                       host_upstream_->localAddress()) &


Nit: prefer logical && over bitwise & when expressing a logical condition, i.e. more declarative; also safer when when one of the conjuncts might not be only 0 or 1.

htuch · 2018-08-21T22:01:42Z

test/integration/hds_integration_test.cc

+  while (!checkEndpointHealthResponse(response_.endpoint_health_response().endpoints_health(0),
+                                      envoy::api::v2::core::HealthStatus::HEALTHY,
+                                      host_upstream_->localAddress())) {
+    ASSERT_TRUE(hds_stream_->waitForGrpcMessage(*dispatcher_, response_));


This while pattern appears a lot, can you factor it into a waitForEndpointHealthResponse method? You can skip doing this for the more complex case of waiting for two conditions further below, but I think most of the cases are simple.

Signed-off-by: markatou <lilika@google.com>

htuch

Thanks!

Making hds_integration test faster and more robust

1d1d77a

Signed-off-by: Lilika Markatou <lilika@google.com>

mattklein123 reviewed Aug 16, 2018

View reviewed changes

mattklein123 assigned lizan and htuch Aug 16, 2018

Addressing comments

b7d94d4

Signed-off-by: Lilika Markatou <lilika@google.com>

Addressing comments

6040189

Signed-off-by: Lilika Markatou <lilika@google.com>

htuch reviewed Aug 16, 2018

View reviewed changes

zuercher mentioned this pull request Aug 16, 2018

upstream: allow extension protocol options to be used with http filters #4165

Merged

markatou added 2 commits August 21, 2018 10:21

Merge remote-tracking branch 'upstream/master' into less_fragile

02687d9

Addressing comments

a851f9d

Signed-off-by: markatou <lilika@google.com>

htuch suggested changes Aug 21, 2018

View reviewed changes

Addressing comments

a4dd166

Signed-off-by: markatou <lilika@google.com>

htuch approved these changes Aug 22, 2018

View reviewed changes

htuch merged commit 4acd243 into envoyproxy:master Aug 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Making //test/integration:hds_integration_test test faster and more robust #4164

Making //test/integration:hds_integration_test test faster and more robust #4164

Uh oh!

markatou commented Aug 15, 2018

Uh oh!

lizan commented Aug 15, 2018

Uh oh!

mattklein123 left a comment

Uh oh!

mattklein123 Aug 16, 2018

Uh oh!

markatou Aug 16, 2018

Uh oh!

mattklein123 Aug 16, 2018

Uh oh!

markatou commented Aug 16, 2018

Uh oh!

htuch left a comment

Uh oh!

htuch Aug 16, 2018

Uh oh!

htuch Aug 16, 2018

Uh oh!

htuch left a comment

Uh oh!

htuch Aug 21, 2018

Uh oh!

htuch Aug 21, 2018

Uh oh!

htuch left a comment

Uh oh!

Uh oh!

Making //test/integration:hds_integration_test test faster and more robust #4164

Making //test/integration:hds_integration_test test faster and more robust #4164

Uh oh!

Conversation

markatou commented Aug 15, 2018

Uh oh!

lizan commented Aug 15, 2018

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

mattklein123 Aug 16, 2018

Choose a reason for hiding this comment

Uh oh!

markatou Aug 16, 2018

Choose a reason for hiding this comment

Uh oh!

mattklein123 Aug 16, 2018

Choose a reason for hiding this comment

Uh oh!

markatou commented Aug 16, 2018

Uh oh!

htuch left a comment

Choose a reason for hiding this comment

Uh oh!

htuch Aug 16, 2018

Choose a reason for hiding this comment

Uh oh!

htuch Aug 16, 2018

Choose a reason for hiding this comment

Uh oh!

htuch left a comment

Choose a reason for hiding this comment

Uh oh!

htuch Aug 21, 2018

Choose a reason for hiding this comment

Uh oh!

htuch Aug 21, 2018

Choose a reason for hiding this comment

Uh oh!

htuch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!