-
Notifications
You must be signed in to change notification settings - Fork 5.1k
upstream: init host hc value based on hc value from other priorities #3959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In order to allow hosts to move from one priority to another, this compares a "new" host against all known hosts in order to determine whether we can avoid reseting the active health check value of the host. The idea is that if we already know that a host is healthy we should treat it as such when it's added to a different priority. Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Test failure seems to be due to a timeout in the websockets integration tests, which I doubt is related. Would it be possible to rerun the Ipv6 tests? |
|
Signed-off-by: Snow Pettersen <snowp@squareup.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this.
source/common/upstream/eds.cc
Outdated
@@ -61,6 +61,15 @@ void EdsClusterImpl::onConfigUpdate(const ResourceVector& resources, const std:: | |||
throw EnvoyException(fmt::format("Unexpected EDS cluster (expecting {}): {}", cluster_name_, | |||
cluster_load_assignment.cluster_name())); | |||
} | |||
|
|||
HostVector all_current_hosts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I agree with the intention here, I'm concerned about the perf impact of this change, given that we are already having perf issues with this code which is starting to be addressed in #3941 by @rgs1.
I think we might need to holistically step back and try to figure out the right path forward for solving this issue while also keeping/improving perf. Concretely, I wonder if we should be keeping an "all hosts" unordered_map keyed by address in parallel to the per-priority vectors? This would allow us to quickly do checks for whether a host already exists in a different priority.
I will also review #3941 today and will probably have additional thoughts after I do that so will circle back then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I thought about the map approach but I wasn't sure about the perf requirements of this code path so I wanted to float the naive solution first.
Thinking out loud a bit: if we kept such a map we could potentially simplify/speed up this code quite a bit by making addresses unique across priorities: instead of iterating over the current hosts we'd just check the map for an existing host with that address. This would make moving a host between priorities no longer require additional code and would eliminate the O(n) scan over existing addresses. A nice thing here is that we could reuse the Host
from another priority when a host moves, saving us some heap allocations.
Alternatively if we want to maintain the current behavior of allowing duplicate addresses across priorities: we could maintain two maps std::unordered_map<AddressPtr, Host>
and std::unordered_map<std::pair<AddressPtr, Integer>, Host>
where one would let us look up hosts from any priority (for the "all hosts" check) and the other would let us look up hosts within the same priority. Essentially this transforms the code in my PR to two map lookups instead of two array scans per priority.
Unclear to me whether the host uniqueness only within priority invariant is intentional or just a result of the implementation, so not sure how important it is to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear to me whether the host uniqueness only within priority invariant is intentional or just a result of the implementation, so not sure how important it is to maintain.
I actually don't know if this was intentional or not. I feel like it was but I'm not sure. IMO it would be great to simplify it if we can. @mpuncel @htuch @alyssawilk @zuercher do you remember the history here? I don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have historical context on the code, but that plan sounds reasonable. I can't think of a reason you'd want multiple endpoints with the same address in the same cluster since there is proper support for load balancing weight.
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Updated this to maintain a map of address -> host instead of constructing a list on each config update. PTAL |
return updated_hosts.count(host->address()->asString()); | ||
}), current_hosts.end()); | ||
|
||
// Do a single pass over the current_hosts and remove any that were added to the map in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy/pasta?
source/common/upstream/eds.h
Outdated
@@ -41,7 +41,7 @@ class EdsClusterImpl : public BaseDynamicClusterImpl, | |||
LocalityWeightsMap& locality_weights_map, | |||
LocalityWeightsMap& new_locality_weights_map, | |||
PriorityStateManager& priority_state_manager, | |||
const HostVector& all_hosts); | |||
std::unordered_map<std::string, HostSharedPtr>& updated_hosts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the new param name (updated_hosts
) doesn't match the definition's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@htuch @mattklein123 took the liberty of doing a quick pass, since everything related to health checks/eds/host updates performance is close to my heart.
source/common/upstream/eds.cc
Outdated
bool EdsClusterImpl::updateHostsPerLocality( | ||
const uint32_t priority, const HostVector& new_hosts, LocalityWeightsMap& locality_weights_map, | ||
LocalityWeightsMap& new_locality_weights_map, PriorityStateManager& priority_state_manager, | ||
std::unordered_map<std::string, HostSharedPtr>& new_all_hosts) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_all_hosts
is a bit confusing here, was this meant to be updated_hosts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah names went through a few iterations, will change
source/common/upstream/eds.cc
Outdated
if (updateDynamicHostList(new_hosts, *current_hosts_copy, hosts_added, hosts_removed) || | ||
|
||
if (updateDynamicHostList(new_hosts, *current_hosts_copy, hosts_added, hosts_removed, | ||
new_all_hosts, all_hosts_) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm why pass all_hosts_
as a param given it's a member of BaseDynamicClusterImpl
(so updateDynamicHostList()
has access to it already)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point! i'll update
@@ -40,7 +40,8 @@ class EdsClusterImpl : public BaseDynamicClusterImpl, | |||
bool updateHostsPerLocality(const uint32_t priority, const HostVector& new_hosts, | |||
LocalityWeightsMap& locality_weights_map, | |||
LocalityWeightsMap& new_locality_weights_map, | |||
PriorityStateManager& priority_state_manager); | |||
PriorityStateManager& priority_state_manager, | |||
std::unordered_map<std::string, HostSharedPtr>& updated_hosts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noted before, doesn't match the definition's param name.
std::unordered_set<std::string> host_addresses; | ||
|
||
// Keep track of hosts we've seen during this pass so we can remove it from the | ||
// list of current (i.e. existing hosts from this priority) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: most (all?) envoy comments are fully formed sentences (e.g.: finish them with a period, etc).
Signed-off-by: Snow Pettersen <snowp@squareup.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on.
}), | ||
current_hosts.end()); | ||
|
||
// Do a single pass over the current_hosts and remove any that were added to the map in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this partial comment is spurious.
const bool dont_remove_healthy_hosts = | ||
health_checker_ != nullptr && !info()->drainConnectionsOnHostRemoval(); | ||
// If there are removed hosts, check to see if we should only delete if unhealthy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, i was supposed to remove the other comment you commented on
i = current_hosts.erase(i); | ||
} else { | ||
i++; | ||
} | ||
} | ||
} | ||
|
||
// Add the remaining hosts into the updated list of all hosts | ||
for (auto& host : final_hosts) { | ||
updated_hosts[host->address()->asString()] = host; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me like every place where you put a host in final_hosts, you also add it to updated_hosts. It this loop necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah you're right, i'll remove
for (auto& host : final_hosts) { | ||
updated_hosts[host->address()->asString()] = host; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a comment on line 952 (below) that I think needs to be updated. It currently says
// During the search we moved all of the hosts from hosts_ into final_hosts so just
// move them back.
@@ -1054,15 +1065,19 @@ void StrictDnsClusterImpl::ResolveTarget::startResolve() { | |||
parent_.info_, dns_address_, Network::Utility::getAddressWithPort(*address, port_), | |||
lb_endpoint_.metadata(), lb_endpoint_.load_balancing_weight().value(), | |||
locality_lb_endpoint_.locality(), lb_endpoint_.endpoint().health_check_config())); | |||
updated_hosts[address->asString()] = new_hosts.back(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think pre-emptively putting this host in updated_hosts will prevent the max host weight check from occurring in updateDynamicHostList. Why not just let that function do this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i agree - i suspect this was left over from a previous iteration and is no longer needed.
Signed-off-by: Snow Pettersen <snowp@squareup.com>
This should be ready for another review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this is a great improvement. Some small comments.
@@ -630,6 +632,7 @@ class StrictDnsClusterImpl : public BaseDynamicClusterImpl { | |||
uint32_t port_; | |||
Event::TimerPtr resolve_timer_; | |||
HostVector hosts_; | |||
std::unordered_map<std::string, HostSharedPtr> all_hosts_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used?
if (!hosts_added.empty() || !current_hosts.empty()) { | ||
hosts_removed = std::move(current_hosts); | ||
current_hosts = std::move(final_hosts); | ||
return true; | ||
} else { | ||
// During the search we moved all of the hosts from hosts_ into final_hosts so just | ||
// move them back. | ||
// During the search populated final_hosts with either from hosts_ or from all_hosts, so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is a little hard to read and I think the comment is important. Can you tighten?
source/common/upstream/eds.cc
Outdated
@@ -133,14 +138,17 @@ bool EdsClusterImpl::updateHostsPerLocality(const uint32_t priority, const HostV | |||
// out of the locality scheduler, we discover their new weights. We don't currently have a shared | |||
// object for locality weights that we can update here, we should add something like this to | |||
// improve performance and scalability of locality weight updates. | |||
if (updateDynamicHostList(new_hosts, *current_hosts_copy, hosts_added, hosts_removed) || | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: del newline
@@ -503,6 +503,7 @@ class ClusterImplBase : public Cluster, protected Logger::Loggable<Logger::Id::u | |||
|
|||
protected: | |||
PrioritySetImpl priority_set_; | |||
std::unordered_map<std::string, Upstream::HostSharedPtr> all_hosts_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used in the base class? Should this be part of BaseDynamicClusterImpl
?
|
||
// Keep track of hosts we've seen during this pass so we can remove it from the | ||
// list of current (i.e. existing hosts from this priority) hosts. | ||
std::unordered_set<std::string> seen_hosts(current_hosts.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to the ongoing PR that @dio is working on where we are trying to make this code easier to understand. I actually think this PR is a huge improvement, but can you potentially either add more comments or think of different names for the variables? "seen_hosts" and "current_hosts" is still pretty confusing. In this code I would opt for very long and verbose names and/or lots of comments since it's so hard to understand.
Signed-off-by: Snow Pettersen <snowp@squareup.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did another pass
hosts_added.push_back(host); | ||
|
||
// If we are depending on a health checker, we initialize to unhealthy. | ||
// If we are depending on a health checker, determine what to initialize the new host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this comment changing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 seems more clear before
* @param hosts_added_to_priority will be populated with hosts added to the priority. | ||
* @param hosts_removed_from_priority will be populated with hosts removed from the priority. | ||
* @param updated_hosts is used to aggregate the new state of all hosts accross priority, and will | ||
* be be updated with the hosts that remain in this priority after the update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: be be
* @param current_priority_hosts the full lists of hosts for the priority to be updated. The list | ||
* will be modified to contain the updated list of hosts. | ||
* @param hosts_added_to_priority will be populated with hosts added to the priority. | ||
* @param hosts_removed_from_priority will be populated with hosts removed from the priority. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't match the param name (hosts_removed
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this is a big improvement. LGTM other than some nits. @zuercher can you take another pass through?
hosts_added.push_back(host); | ||
|
||
// If we are depending on a health checker, we initialize to unhealthy. | ||
// If we are depending on a health checker, determine what to initialize the new host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 seems more clear before
Signed-off-by: Snow Pettersen <snowp@squareup.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. One comment from me
HostVector& hosts_removed_from_current_priority, | ||
std::unordered_map<std::string, HostSharedPtr>& updated_hosts); | ||
|
||
std::unordered_map<std::string, Upstream::HostSharedPtr> all_hosts_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like all_hosts_ is only used by updateDynamicHostList. Perhaps it should be private with an explicit method that takes updated_hosts invoked after all the priorities have been updated? My reasoning is that it makes management of all_hosts_ explicitly the responsibility of BaseDynamicClusterImpl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i like that idea
Signed-off-by: Snow Pettersen <snowp@squareup.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
* origin/master: fix flaky RBAC integration test. (envoyproxy#4147) header_map: copy constructor for HeaderMapImpl. (envoyproxy#4129) test: moving websocket tests to using HTTP codec. (envoyproxy#4143) upstream: init host hc value based on hc value from other priorities (envoyproxy#3959) Signed-off-by: Snow Pettersen <snowp@squareup.com>
Description:
In order to allow hosts to move from one priority to another, this
compares a "new" host against all known hosts in order to determine
whether we've already health checked this host successfully.
The idea is that if we already know that a host is healthy we should
treat it as such when it's added to a different priority. Without this, we'll
fail requests coming in between the EDS update and the moved hosts
being successfully health checked.
Signed-off-by: Snow Pettersen snowp@squareup.com
Risk Level: Low, only affects the initial health check value when hosts are added to a priority
Testing: Unit test
Docs Changes: N/A
Release Notes: N/A
Fixes #3930