Skip to content

Conversation

snowp
Copy link
Contributor

@snowp snowp commented Jan 19, 2019

Fixes two bugs related to degraded health checks:

  1. When an active health check succeeded but marked the host as
    degraded, we'd always set the change_state to ChangePending, which meant
    we'd not ever invoke the callback when hosts went from healthy ->
    degraded.
  2. Fix typo preventing degraded hosts from being updated on worker
    threads.

Signed-off-by: Snow Pettersen snowp@squareup.com

Risk Level: Medium
Testing: Updated unit tests
Docs Changes: n/a
Release Notes: n/a
#5063

Fixes two bugs with degraded health checks:
1) When an active health check succeeded but marked the host as
degraded, we'd always set the change_state to ChangePending, which meant
we'd not ever invoke the callback when hosts went from healthy ->
degraded.
2) Fix typo preventing degraded hosts from being updated on worker
threads.

Signed-off-by: Snow Pettersen <snowp@squareup.com>
@@ -656,7 +656,7 @@ void ClusterManagerImpl::postThreadLocalClusterUpdate(const Cluster& cluster, ui
// TODO(htuch): Can we skip these copies by exporting out const shared_ptr from HostSet?
HostVectorConstSharedPtr hosts_copy(new HostVector(host_set->hosts()));
HostVectorConstSharedPtr healthy_hosts_copy(new HostVector(host_set->healthyHosts()));
HostVectorConstSharedPtr degraded_hosts_copy(new HostVector(host_set->healthyHosts()));
HostVectorConstSharedPtr degraded_hosts_copy(new HostVector(host_set->degradedHosts()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a good place to add a test for this - any ideas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should be able to add a test in cluster_manager_impl_test that does a sequence of updates and then verifies that the thread local host sets have the hosts they should?

Signed-off-by: Snow Pettersen <snowp@squareup.com>
@mattklein123 mattklein123 self-assigned this Jan 21, 2019
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good pending test.

/wait

@@ -656,7 +656,7 @@ void ClusterManagerImpl::postThreadLocalClusterUpdate(const Cluster& cluster, ui
// TODO(htuch): Can we skip these copies by exporting out const shared_ptr from HostSet?
HostVectorConstSharedPtr hosts_copy(new HostVector(host_set->hosts()));
HostVectorConstSharedPtr healthy_hosts_copy(new HostVector(host_set->healthyHosts()));
HostVectorConstSharedPtr degraded_hosts_copy(new HostVector(host_set->healthyHosts()));
HostVectorConstSharedPtr degraded_hosts_copy(new HostVector(host_set->degradedHosts()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should be able to add a test in cluster_manager_impl_test that does a sequence of updates and then verifies that the thread local host sets have the hosts they should?

Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@mattklein123 mattklein123 merged commit c8b1398 into envoyproxy:master Jan 22, 2019
danzh2010 pushed a commit to danzh2010/envoy that referenced this pull request Jan 24, 2019
Fixes two bugs with degraded health checks:
1) When an active health check succeeded but marked the host as
degraded, we'd always set the change_state to ChangePending, which meant
we'd not ever invoke the callback when hosts went from healthy ->
degraded.
2) Fix typo preventing degraded hosts from being updated on worker
threads.

Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Dan Zhang <danzh@google.com>
fredlas pushed a commit to fredlas/envoy that referenced this pull request Mar 5, 2019
Fixes two bugs with degraded health checks:
1) When an active health check succeeded but marked the host as
degraded, we'd always set the change_state to ChangePending, which meant
we'd not ever invoke the callback when hosts went from healthy ->
degraded.
2) Fix typo preventing degraded hosts from being updated on worker
threads.

Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants