-
Notifications
You must be signed in to change notification settings - Fork 8.1k
shared control plane multicluster fixes #21912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Rename the remote istiod service and endpoint to `istiod-remote` to avoid conflicts with the real local istiod service. * Use the `istiod-remote.<namespace>.svc` hostname for the sidecar and ingress proxies discoveryAddress. This address needs to match the SAN in istiod's cert. The `istiod-remote` headless service will resolve the hostname to the remote IP address. * Add the `istiod-remote` hostname to istiod's SANs. Also use istiod's namespace to construct the legacy service names instead of hardcoding them to `istio-system`. * Simplify the remote profile by removing redundant and unused values.
Keeping this as a draft while I verify the cleaned up changes against the revised docs (PR pending). There is additional cleanup that I intentionally skipped in this PR to minimize the changes when they're cherry-picked into 1.5. |
apiVersion: v1 | ||
kind: Endpoints | ||
metadata: | ||
name: istio-pilot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is istio-pilot endpoint really needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would only be needed if we need to support multicluster with the pre-istiod deployment model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW - I think the istiod & istio-pilot endpoints will also be created as the result of pilot.enabled=true. so do we still need to create this again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
istio-pilot
shouldn't be needed in 1.6+. I kept it because it would make backporting to 1.5 a bit easier. If someone was using the legacy charts and separate services in 1.5 they wouldn't be enabling pilot in the remote cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't enabling pilot the only way to enable istiod? That is what the remote profile has today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
istiod runs in each clsuter and we need the istiod-remote service to avoid conflicting the "real" istiod service in the cluster.
When istiod=false (separate services) pilot doesn't need to run in the remote cluster. The istio-pilot
service in the remote cluster points to the pilot in the main cluster. This is how things worked in <= 1.4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need to support the istiod=false case for multicluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree for not needing this in master. What about 1.5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ayj I'd recommend sorting out what this should look like for 1.5
in master. Then follow up with another PR after we release 1.5.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Master doesn't have the legacy helm charts. We'll need to manually cherry-pick this PR anyway.
@ayj thank you for the PR! Looks reasonable - definitely I run into some of the challenges you fixed in the PR as well. Having it changed to istiod-remote for xDS serving reduces a lot of confusing. |
…icluster-fixes
split-horizon is working now. Requests from each cluster are balanced roughly evenly across all clusters. I've tested with HTTP echo tests so far. I still need to run more extensive traffic and load tests. Cluster topology: I'm still seeing ACK errors in the istiod and proxy logs.
It looks like this might be the cause which is similar to #18671 and #20087.
Full logs here |
@howardjohn @rshriram @costinm, any ideas on debugging the ACR error with zero lbEndpoints? I added extra logging to see where we might be setting weights to zero (or more likely not setting them at all). I didn't find anything. This is easily reproducible. |
Shriram ran into this as well, not sure we found the root cause?
…On Sat, Mar 7, 2020, 12:15 PM Jason Young ***@***.***> wrote:
@howardjohn <https://github.com/howardjohn> @rshriram
<https://github.com/rshriram> @costinm <https://github.com/costinm>, any
ideas on debugging the ACR error with zero lbEndpoints? I added extra
logging to see where we might be setting weights to zero (or more likely
not setting them at all). I didn't find anything. This is easily
reproducible.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21912?email_source=notifications&email_token=AAEYGXNT5CSWSUTXCEU673TRGKTP3A5CNFSM4LCXEYXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOEEN4I#issuecomment-596133617>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEYGXLXRY5QQCGJEEXYTQ3RGKTP3ANCNFSM4LCXEYXA>
.
|
That |
The LB weights were always greater than zero immediately prior to sending. The issue didn't occur the endpoints proto cloned before calling stream.Send(). I instrumented pilot to run in-cluster as unit test with the race detector on. It found a data race in the eds path where the endpoint LB weights are updated/read. Looks like endpoints from the cached endpoint shared are reused across connections / pushes. I'm not sure how this is suppose to work for multiple networks where weights for the same service/endpoints could differ.
istio/pilot/pkg/proxy/envoy/v2/eds.go Line 876 in ee52425
Full race backtrace - https://gist.github.com/ayj/0bf0d1b8a349f90aaaeafd81b06863a5 |
Not sure how much work this involved but you can do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ayj LG! can we get this merged first? I would like to try multi-network scenario too.
@@ -109,6 +109,15 @@ spec: | |||
{{- end }} | |||
- --controlPlaneAuthPolicy | |||
- NONE | |||
- --discoveryAddress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't do anything in master, it is reading from mesh config directly. 1.5 its needed though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove after this merges?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup - #22050
Removed the hold. The bad endpoint LB weight doesn't seem to be introduced by this change, though it does increase the likelihood with multiple networks. |
Didn't know that existed - thanks. |
apiVersion: v1 | ||
kind: Endpoints | ||
metadata: | ||
name: istio-pilot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need to support the istiod=false case for multicluster
@@ -256,14 +253,22 @@ data: | |||
{{- if not (eq .Values.revision "") }} | |||
{{- $defPilotHostname := printf "istiod-%s.%s.svc" .Values.revision .Release.Namespace }} | |||
{{- else }} | |||
{{- $defPilotHostname := printf "istiod.%s.svc" .Release.Namespace }} | |||
{{- $defPilotHostname := printf "istiod.%s.svc" .Release.Namespace }} | |||
{{- end }} | |||
{{- $defPilotHostname := printf "istiod%s.%s.svc" .Values.revision .Release.Namespace }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defining variables like this doesn't actually work inside if statements. this is fixed on 1.5 but somehow broken on master, I have a fix in https://github.com/istio/istio/pull/21788/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. Looks like your PR and my change below are similar except for the user of the DNS name instead of IP address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, thanks!
operator/data/profiles/default.yaml
Outdated
@@ -504,6 +504,9 @@ spec: | |||
- port: 15011 | |||
targetPort: 15011 | |||
name: tcp-pilot-grpc-tls | |||
- port: 15012 | |||
targetPort: 15012 | |||
name: http-istiod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be grpc-istiod
? Or tcp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume tcp, it is used everywhere for port 15012
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to tcp-istiod
any idea why this is closed @howardjohn @ayj ? |
@ayj I pulled this PR locally and tested with master. I'm getting this error during install. This relates to a point I tried to make earlier that we are creating istio-pilot service as part of istiod enabled in remote cluster which will create the istio-pilot endpoint using the selector indirectly, so when we ask to specifically to create the istio-pilot endpoint it will fail.
|
…icluster-fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything lgtm except the minor error comment
@@ -109,6 +109,15 @@ spec: | |||
{{- end }} | |||
- --controlPlaneAuthPolicy | |||
- NONE | |||
- --discoveryAddress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove after this merges?
// 3. The pod is not present when this is called | ||
// due to eventual consistency issues. However, we have a lot of information about the pod from the proxy | ||
// metadata already. Because of this, we can still get most of the information we need. | ||
// If we cannot accurately construct ServiceInstances from just the metadata, this will return an error and we can | ||
// attempt to read the real pod. | ||
out, err = c.getProxyServiceInstancesFromMetadata(proxy) | ||
if err != nil { | ||
log.Warnf("getProxyServiceInstancesFromMetadata for %v failed: %v", proxy.ID, err) | ||
err = fmt.Errorf("getProxyServiceInstancesFromMetadata for %v (ip=%v) failed: %v", proxy.ID, proxyIP, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this should be a warning. Its not an internal error, its just sometimes not possible from the metadata provided by the client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I didn't notice you aren't logging this. We should not return an error here, I think. Was there a reason you think we should?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back and forth on changing this. I was seeing a lot of these messages in the logs. It didn't seem like an error/warning when using multiple control planes. The check on line 541 will fail because Cluster A's registry won't have any pods for Cluster B. And getProxyServiceInstancesFromMetadata will fail because the remote proxy's cluster_id doesn't match the cluster's ID. It should still be printed but as a debug statement by the aggregate.
Some other alternatives:
- modify getProxyServiceInstancesFromMetadata to not return an error when cluster ID's didn't match.
- modify the aggregate controller to only call GetProxyServiceInstances() on the controller whose cluster_id matches the target proxy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with making changes so this logs nothing in multicluster case - that makes complete sense. but if we return an error here, won't the aggregate controller bubble that error up and ultimately we will return an error? So it will always be an error for MC case? I am surprised this works at all, am I missing some step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT the aggregate controller will log the error but not bubble it up if another controller returned one or more instances (see
istio/pilot/pkg/serviceregistry/aggregate/controller.go
Lines 266 to 270 in f8464b5
if len(out) > 0 { | |
if errs != nil { | |
log.Warnf("GetProxyWorkloadLabels() found match but encountered an error: %v", errs) | |
} | |
return out, nil |
How about the following? cc @nmittler
diff --git a/pilot/pkg/serviceregistry/aggregate/controller.go b/pilot/pkg/serviceregistry/aggregate/controller.go
index 1741a762e..6964364a4 100644
--- a/pilot/pkg/serviceregistry/aggregate/controller.go
+++ b/pilot/pkg/serviceregistry/aggregate/controller.go
@@ -228,6 +228,9 @@ func (c *Controller) GetProxyServiceInstances(node *model.Proxy) ([]*model.Servi
// It doesn't make sense for a single proxy to be found in more than one registry.
// TODO: if otherwise, warning or else what to do about it.
for _, r := range c.GetRegistries() {
+ if r.Cluster() != "" && node.Metadata != nil && node.Metadata.ClusterID != "" && node.Metadata.ClusterID != r.Cluster() {
+ continue
+ }
instances, err := r.GetProxyServiceInstances(node)
if err != nil {
errs = multierror.Append(errs, err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks... I looked right at that code and missed it. I think that makes sense, probably means things up a bit in general
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to filter in the aggregate controller. We need to special case the default cluster which is always name Kubernetes
. I started to clean things up so the default registry uses the provided ClusterName but decided defer that for another PR. This is large enough as-is.
node.Metadata.ClusterID != r.Cluster() { | ||
log.Debugf("GetProxyServiceInstances(): not checking cluster %v: proxy %v is associated with cluster %v", | ||
r.Cluster(), node.ID, node.Metadata.ClusterID) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if r.Cluster == Kubernetes
and node.Cluster == MyOtherCluster
, this will NOT be skipped? Isn't that what we were trying to avoid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is one of the cases we were trying to avoid. ClusterID is overloaded to mean both the registry type (k8s, consul) and the name (which cluster). I didn't want to fix that issue holistically in this PR (see / #22093). On further thought though we could filter 100% of the cases if we had some special handling above that is less intrusive (see skipSearchingRegistryForProxy above).
instances, err := r.GetProxyServiceInstances(node) | ||
if err != nil { | ||
errs = multierror.Append(errs, err) | ||
} else if len(instances) > 0 { | ||
out = append(out, instances...) | ||
out = instances |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? This means we can only get endpoints from one registry? Which will break all the other registries like ServiceEntry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the for loop comment - "It doesn't make sense for a single proxy to be found in more than one registry". out
only ever has at most instances from one registry. The code breaks out of the loop two lines below on line 251.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that comment is accurate? I am surprised all tests pass, I think you should be able to have ServiceEntry + Kube.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the change from this PR. We can address it in a follow-up.
@ayj any plan to back port this to 1.5? |
Yes. PR is in progress. I'm testing the upgrade/downgrade cases. 1.4<=>1.5 with istiodctl and istiod and non-istiod variants are passing. Checking the legacy helm chart case and then I'll send out the PR for review. |
Rename the remote istiod service and endpoint to
istiod-remote
toavoid conflicts with the real local istiod service.
Use the
istiod-remote.<namespace>.svc
hostname for the sidecar andingress proxies discoveryAddress. This address needs to match the
SAN in istiod's cert. The
istiod-remote
headless service willresolve the hostname to the remote IP address.
Add the
istiod-remote
hostname to istiod's SANs. Also use istiod'snamespace to construct the legacy service names instead of
hardcoding them to
istio-system
.Simplify the remote profile by removing redundant and unused values.
Usage: