upstream: fix degraded health check and thread posting #5662

snowp · 2019-01-19T06:23:13Z

Fixes two bugs related to degraded health checks:

When an active health check succeeded but marked the host as
degraded, we'd always set the change_state to ChangePending, which meant
we'd not ever invoke the callback when hosts went from healthy ->
degraded.
Fix typo preventing degraded hosts from being updated on worker
threads.

Signed-off-by: Snow Pettersen snowp@squareup.com

Risk Level: Medium
Testing: Updated unit tests
Docs Changes: n/a
Release Notes: n/a
#5063

Fixes two bugs with degraded health checks: 1) When an active health check succeeded but marked the host as degraded, we'd always set the change_state to ChangePending, which meant we'd not ever invoke the callback when hosts went from healthy -> degraded. 2) Fix typo preventing degraded hosts from being updated on worker threads. Signed-off-by: Snow Pettersen <snowp@squareup.com>

snowp · 2019-01-19T06:23:38Z

source/common/upstream/cluster_manager_impl.cc

@@ -656,7 +656,7 @@ void ClusterManagerImpl::postThreadLocalClusterUpdate(const Cluster& cluster, ui
  // TODO(htuch): Can we skip these copies by exporting out const shared_ptr from HostSet?
  HostVectorConstSharedPtr hosts_copy(new HostVector(host_set->hosts()));
  HostVectorConstSharedPtr healthy_hosts_copy(new HostVector(host_set->healthyHosts()));
-  HostVectorConstSharedPtr degraded_hosts_copy(new HostVector(host_set->healthyHosts()));
+  HostVectorConstSharedPtr degraded_hosts_copy(new HostVector(host_set->degradedHosts()));


I couldn't find a good place to add a test for this - any ideas?

I think you should be able to add a test in cluster_manager_impl_test that does a sequence of updates and then verifies that the thread local host sets have the hosts they should?

Signed-off-by: Snow Pettersen <snowp@squareup.com>

mattklein123

Looks good pending test.

/wait

mattklein123 · 2019-01-21T16:53:22Z

source/common/upstream/cluster_manager_impl.cc

@@ -656,7 +656,7 @@ void ClusterManagerImpl::postThreadLocalClusterUpdate(const Cluster& cluster, ui
  // TODO(htuch): Can we skip these copies by exporting out const shared_ptr from HostSet?
  HostVectorConstSharedPtr hosts_copy(new HostVector(host_set->hosts()));
  HostVectorConstSharedPtr healthy_hosts_copy(new HostVector(host_set->healthyHosts()));
-  HostVectorConstSharedPtr degraded_hosts_copy(new HostVector(host_set->healthyHosts()));
+  HostVectorConstSharedPtr degraded_hosts_copy(new HostVector(host_set->degradedHosts()));


I think you should be able to add a test in cluster_manager_impl_test that does a sequence of updates and then verifies that the thread local host sets have the hosts they should?

Signed-off-by: Snow Pettersen <snowp@squareup.com>

mattklein123

Thank you!

Fixes two bugs with degraded health checks: 1) When an active health check succeeded but marked the host as degraded, we'd always set the change_state to ChangePending, which meant we'd not ever invoke the callback when hosts went from healthy -> degraded. 2) Fix typo preventing degraded hosts from being updated on worker threads. Signed-off-by: Snow Pettersen <snowp@squareup.com> Signed-off-by: Dan Zhang <danzh@google.com>

Fixes two bugs with degraded health checks: 1) When an active health check succeeded but marked the host as degraded, we'd always set the change_state to ChangePending, which meant we'd not ever invoke the callback when hosts went from healthy -> degraded. 2) Fix typo preventing degraded hosts from being updated on worker threads. Signed-off-by: Snow Pettersen <snowp@squareup.com> Signed-off-by: Fred Douglas <fredlas@google.com>

snowp commented Jan 19, 2019

View reviewed changes

spelling

a13bc07

Signed-off-by: Snow Pettersen <snowp@squareup.com>

mattklein123 self-assigned this Jan 21, 2019

mattklein123 requested changes Jan 21, 2019

View reviewed changes

repokitteh-read-only bot added the waiting label Jan 21, 2019

add test for verifying thread local host propagation

d12d63f

Signed-off-by: Snow Pettersen <snowp@squareup.com>

repokitteh-read-only bot removed the waiting label Jan 21, 2019

fix test, update description

8bf53cf

Signed-off-by: Snow Pettersen <snowp@squareup.com>

mattklein123 approved these changes Jan 22, 2019

View reviewed changes

mattklein123 merged commit c8b1398 into envoyproxy:master Jan 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

upstream: fix degraded health check and thread posting #5662

upstream: fix degraded health check and thread posting #5662

Uh oh!

snowp commented Jan 19, 2019 •

edited

Loading

Uh oh!

snowp Jan 19, 2019

Uh oh!

mattklein123 Jan 21, 2019

Uh oh!

mattklein123 left a comment

Uh oh!

mattklein123 Jan 21, 2019

Uh oh!

mattklein123 left a comment

Uh oh!

Uh oh!

upstream: fix degraded health check and thread posting #5662

upstream: fix degraded health check and thread posting #5662

Uh oh!

Conversation

snowp commented Jan 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snowp Jan 19, 2019

Choose a reason for hiding this comment

Uh oh!

mattklein123 Jan 21, 2019

Choose a reason for hiding this comment

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

mattklein123 Jan 21, 2019

Choose a reason for hiding this comment

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

snowp commented Jan 19, 2019 •

edited

Loading