Skip to content

Conversation

ayj
Copy link
Contributor

@ayj ayj commented Mar 6, 2020

  • Rename the remote istiod service and endpoint to istiod-remote to
    avoid conflicts with the real local istiod service.

  • Use the istiod-remote.<namespace>.svc hostname for the sidecar and
    ingress proxies discoveryAddress. This address needs to match the
    SAN in istiod's cert. The istiod-remote headless service will
    resolve the hostname to the remote IP address.

  • Add the istiod-remote hostname to istiod's SANs. Also use istiod's
    namespace to construct the legacy service names instead of
    hardcoding them to istio-system.

  • Simplify the remote profile by removing redundant and unused values.

Usage:

  • master
kubeconfig config use-context=<master>
kubectl create namespace istio-system
kubectl create secret generic cacerts -n istio-system \
    --from-file=samples/certs/ca-cert.pem \
    --from-file=samples/certs/ca-key.pem \
    --from-file=samples/certs/root-cert.pem \
    --from-file=samples/certs/cert-chain.pem

# Option (1) - expose istiod through the istio-ingressgateway
istioctl manifest apply --set values.global.meshExpansion.enabled=true
# wait for ingress external IP to be ready
ISTIOD_REMOTE=$(kubectl -n istio-system get svc istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

# Option (2) - create a dedicated control plane gateway 
istioctl manifest apply 
istioctl manifest apply -f <dedicated-control-plane-gw>
ISTIOD_REMOTE=<dedicated gateway external IP>

# Option (3)
istioctl manifest apply -f <profile-with-istiod-ilb>
ISTIOD_REMOTE=$(kubectl -n istio-system get svc istiod -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
  • remote
kubeconfig config use-context=<remote>
kubectl create namespace istio-system
kubectl create secret generic cacerts -n istio-system \
    --from-file=samples/certs/ca-cert.pem \
    --from-file=samples/certs/ca-key.pem \
    --from-file=samples/certs/root-cert.pem \
    --from-file=samples/certs/cert-chain.pem
istioctl manifest apply --set values.global.remotePilotAddress=$(ISTIOD_REMOTE)
istioctl x create-remote-secret | kubectl --context=<master> apply -f -

* Rename the remote istiod service and endpoint to `istiod-remote` to
  avoid conflicts with the real local istiod service.

* Use the `istiod-remote.<namespace>.svc` hostname for the sidecar and
  ingress proxies discoveryAddress. This address needs to match the
  SAN in istiod's cert. The `istiod-remote` headless service will
  resolve the hostname to the remote IP address.

* Add the `istiod-remote` hostname to istiod's SANs. Also use istiod's
  namespace to construct the legacy service names instead of
  hardcoding them to `istio-system`.

* Simplify the remote profile by removing redundant and unused values.
@istio-testing istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Mar 6, 2020
@googlebot googlebot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Mar 6, 2020
@istio-testing istio-testing added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 6, 2020
@ayj
Copy link
Contributor Author

ayj commented Mar 6, 2020

Keeping this as a draft while I verify the cleaned up changes against the revised docs (PR pending).

There is additional cleanup that I intentionally skipped in this PR to minimize the changes when they're cherry-picked into 1.5.

apiVersion: v1
kind: Endpoints
metadata:
name: istio-pilot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is istio-pilot endpoint really needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would only be needed if we need to support multicluster with the pre-istiod deployment model.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW - I think the istiod & istio-pilot endpoints will also be created as the result of pilot.enabled=true. so do we still need to create this again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

istio-pilot shouldn't be needed in 1.6+. I kept it because it would make backporting to 1.5 a bit easier. If someone was using the legacy charts and separate services in 1.5 they wouldn't be enabling pilot in the remote cluster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't enabling pilot the only way to enable istiod? That is what the remote profile has today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

istiod runs in each clsuter and we need the istiod-remote service to avoid conflicting the "real" istiod service in the cluster.

When istiod=false (separate services) pilot doesn't need to run in the remote cluster. The istio-pilot service in the remote cluster points to the pilot in the main cluster. This is how things worked in <= 1.4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need to support the istiod=false case for multicluster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree for not needing this in master. What about 1.5?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayj I'd recommend sorting out what this should look like for 1.5 in master. Then follow up with another PR after we release 1.5.1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Master doesn't have the legacy helm charts. We'll need to manually cherry-pick this PR anyway.

@linsun
Copy link
Member

linsun commented Mar 6, 2020

@ayj thank you for the PR! Looks reasonable - definitely I run into some of the challenges you fixed in the PR as well. Having it changed to istiod-remote for xDS serving reduces a lot of confusing.

@linsun linsun added this to the 1.5 milestone Mar 6, 2020
@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Mar 7, 2020
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Mar 7, 2020
@ayj
Copy link
Contributor Author

ayj commented Mar 7, 2020

split-horizon is working now. Requests from each cluster are balanced roughly evenly across all clusters. I've tested with HTTP echo tests so far. I still need to run more extensive traffic and load tests.

Cluster topology:
Master (network0)
Remote0, 2 (network1)
Remote1, (network2)

I'm still seeing ACK errors in the istiod and proxy logs.

2020-03-07T19:00:17.066808Z warn ads ADS:EDS: ACK ERROR 10.44.3.16:46908 router~10.44.3.16~istio-ingressgateway-5c4f66856f-fx46n.istio-system~istio-system.svc.cluster.local-6 Internal:Proto constraint validation failed (ClusterLoadAssignmentValidationError.Endpoints[i]: ["embedded message failed validation"] | caused by LocalityLbEndpointsValidationError.LbEndpoints[i]: ["embedded message failed validation"] | caused by LbEndpointValidationError.LoadBalancingWeight: ["value must be greater than or equal to " '\x01']): cluster_name: "outbound|15012||istiod.istio-system.svc.cluster.local"

It looks like this might be the cause which is similar to #18671 and #20087.

load_balancing_weight { } 1: 2 } load_balancing_weight { value: 4 }

Full logs here

@ayj
Copy link
Contributor Author

ayj commented Mar 7, 2020

@howardjohn @rshriram @costinm, any ideas on debugging the ACR error with zero lbEndpoints? I added extra logging to see where we might be setting weights to zero (or more likely not setting them at all). I didn't find anything. This is easily reproducible.

@howardjohn
Copy link
Member

howardjohn commented Mar 7, 2020 via email

@kyessenov
Copy link
Contributor

That 1:2 looks like proto mis-match due to field re-numbering.

@ayj
Copy link
Contributor Author

ayj commented Mar 9, 2020

The LB weights were always greater than zero immediately prior to sending. The issue didn't occur the endpoints proto cloned before calling stream.Send().

I instrumented pilot to run in-cluster as unit test with the race detector on. It found a data race in the eds path where the endpoint LB weights are updated/read. Looks like endpoints from the cached endpoint shared are reused across connections / pushes. I'm not sure how this is suppose to work for multiple networks where weights for the same service/endpoints could differ.

lbEp.LoadBalancingWeight = &wrappers.UInt32Value{

weight += ep.LoadBalancingWeight.GetValue()

Full race backtrace - https://gist.github.com/ayj/0bf0d1b8a349f90aaaeafd81b06863a5

@howardjohn
Copy link
Member

I instrumented pilot to run in-cluster as unit test with the race detector on.

Not sure how much work this involved but you can do go build -race as well by the way I think

Copy link
Member

@linsun linsun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayj LG! can we get this merged first? I would like to try multi-network scenario too.

@@ -109,6 +109,15 @@ spec:
{{- end }}
- --controlPlaneAuthPolicy
- NONE
- --discoveryAddress
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't do anything in master, it is reading from mesh config directly. 1.5 its needed though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove after this merges?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup - #22050

@ayj ayj removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Mar 9, 2020
@ayj
Copy link
Contributor Author

ayj commented Mar 9, 2020

Removed the hold. The bad endpoint LB weight doesn't seem to be introduced by this change, though it does increase the likelihood with multiple networks.

@ayj
Copy link
Contributor Author

ayj commented Mar 9, 2020

Not sure how much work this involved but you can do go build -race as well by the way I think

Didn't know that existed - thanks.

apiVersion: v1
kind: Endpoints
metadata:
name: istio-pilot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need to support the istiod=false case for multicluster

@@ -256,14 +253,22 @@ data:
{{- if not (eq .Values.revision "") }}
{{- $defPilotHostname := printf "istiod-%s.%s.svc" .Values.revision .Release.Namespace }}
{{- else }}
{{- $defPilotHostname := printf "istiod.%s.svc" .Release.Namespace }}
{{- $defPilotHostname := printf "istiod.%s.svc" .Release.Namespace }}
{{- end }}
{{- $defPilotHostname := printf "istiod%s.%s.svc" .Values.revision .Release.Namespace }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defining variables like this doesn't actually work inside if statements. this is fixed on 1.5 but somehow broken on master, I have a fix in https://github.com/istio/istio/pull/21788/files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Looks like your PR and my change below are similar except for the user of the DNS name instead of IP address.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, thanks!

@@ -504,6 +504,9 @@ spec:
- port: 15011
targetPort: 15011
name: tcp-pilot-grpc-tls
- port: 15012
targetPort: 15012
name: http-istiod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be grpc-istiod? Or tcp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume tcp, it is used everywhere for port 15012

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to tcp-istiod

@linsun
Copy link
Member

linsun commented Mar 10, 2020

any idea why this is closed @howardjohn @ayj ?

@linsun
Copy link
Member

linsun commented Mar 10, 2020

@ayj I pulled this PR locally and tested with master. I'm getting this error during install. This relates to a point I tried to make earlier that we are creating istio-pilot service as part of istiod enabled in remote cluster which will create the istio-pilot endpoint using the selector indirectly, so when we ask to specifically to create the istio-pilot endpoint it will fail.

Error from server (Invalid): error when applying patch:
{"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"v1\",\"kind\":\"Service\",\"metadata\":{\"annotations\":{},\"labels\":{\"app\":\"pilot\",\"istio\":\"pilot\",\"release\":\"istio\"},\"name\":\"istio-pilot\",\"namespace\":\"istio-system\"},\"spec\":{\"ports\":[{\"name\":\"grpc-xds\",\"port\":15010},{\"name\":\"https-xds\",\"port\":15011},{\"name\":\"https-dns\",\"port\":15012},{\"name\":\"http-legacy-discovery\",\"port\":8080},{\"name\":\"http-monitoring\",\"port\":15014},{\"name\":\"https-webhook\",\"port\":443,\"targetPort\":15017}],\"selector\":{\"istio\":\"pilot\"}}}\n"},"labels":{"app":"pilot","istio":"pilot","release":"istio"}},"spec":{"$setElementOrder/ports":[{"port":15010},{"port":15011},{"port":15012},{"port":8080},{"port":15014},{"port":443}],"clusterIP":null,"ports":[{"name":"https-dns","port":15012},{"name":"https-webhook","port":443,"targetPort":15017}],"selector":{"istio":"pilot"}}}
to:
Resource: "/v1, Resource=services", GroupVersionKind: "/v1, Kind=Service"
Name: "istio-pilot", Namespace: "istio-system"
Object: &{map["apiVersion":"v1" "kind":"Service" "metadata":map["annotations":map["kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"v1\",\"kind\":\"Service\",\"metadata\":{\"annotations\":{},\"name\":\"istio-pilot\",\"namespace\":\"istio-system\"},\"spec\":{\"clusterIP\":\"None\",\"ports\":[{\"name\":\"grpc-xds\",\"port\":15010},{\"name\":\"https-xds\",\"port\":15011},{\"name\":\"http-legacy-discovery\",\"port\":8080},{\"name\":\"tcp-istiod\",\"port\":15012},{\"name\":\"http-monitoring\",\"port\":15014}]}}\n"] "creationTimestamp":"2020-03-10T15:01:10Z" "name":"istio-pilot" "namespace":"istio-system" "resourceVersion":"48395868" "selfLink":"/api/v1/namespaces/istio-system/services/istio-pilot" "uid":"d8233156-0939-41d2-96b7-1130bcb3a216"] "spec":map["clusterIP":"None" "ports":[map["name":"grpc-xds" "port":'\u3aa2' "protocol":"TCP" "targetPort":'\u3aa2'] map["name":"https-xds" "port":'\u3aa3' "protocol":"TCP" "targetPort":'\u3aa3'] map["name":"http-legacy-discovery" "port":'\u1f90' "protocol":"TCP" "targetPort":'\u1f90'] map["name":"tcp-istiod" "port":'\u3aa4' "protocol":"TCP" "targetPort":'\u3aa4'] map["name":"http-monitoring" "port":'\u3aa6' "protocol":"TCP" "targetPort":'\u3aa6']] "sessionAffinity":"None" "type":"ClusterIP"] "status":map["loadBalancer":map[]]]}
for: "test-remote.yaml": Service "istio-pilot" is invalid: spec.clusterIP: Invalid value: "": field is immutable
Error from server: error when creating "test-remote.yaml": context deadline exceeded

@linsun linsun reopened this Mar 10, 2020
@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Mar 10, 2020
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Mar 10, 2020
Copy link
Member

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything lgtm except the minor error comment

@@ -109,6 +109,15 @@ spec:
{{- end }}
- --controlPlaneAuthPolicy
- NONE
- --discoveryAddress
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove after this merges?

// 3. The pod is not present when this is called
// due to eventual consistency issues. However, we have a lot of information about the pod from the proxy
// metadata already. Because of this, we can still get most of the information we need.
// If we cannot accurately construct ServiceInstances from just the metadata, this will return an error and we can
// attempt to read the real pod.
out, err = c.getProxyServiceInstancesFromMetadata(proxy)
if err != nil {
log.Warnf("getProxyServiceInstancesFromMetadata for %v failed: %v", proxy.ID, err)
err = fmt.Errorf("getProxyServiceInstancesFromMetadata for %v (ip=%v) failed: %v", proxy.ID, proxyIP, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this should be a warning. Its not an internal error, its just sometimes not possible from the metadata provided by the client

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I didn't notice you aren't logging this. We should not return an error here, I think. Was there a reason you think we should?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on changing this. I was seeing a lot of these messages in the logs. It didn't seem like an error/warning when using multiple control planes. The check on line 541 will fail because Cluster A's registry won't have any pods for Cluster B. And getProxyServiceInstancesFromMetadata will fail because the remote proxy's cluster_id doesn't match the cluster's ID. It should still be printed but as a debug statement by the aggregate.

Some other alternatives:

  • modify getProxyServiceInstancesFromMetadata to not return an error when cluster ID's didn't match.
  • modify the aggregate controller to only call GetProxyServiceInstances() on the controller whose cluster_id matches the target proxy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with making changes so this logs nothing in multicluster case - that makes complete sense. but if we return an error here, won't the aggregate controller bubble that error up and ultimately we will return an error? So it will always be an error for MC case? I am surprised this works at all, am I missing some step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the aggregate controller will log the error but not bubble it up if another controller returned one or more instances (see

if len(out) > 0 {
if errs != nil {
log.Warnf("GetProxyWorkloadLabels() found match but encountered an error: %v", errs)
}
return out, nil
). However, if there is no instances the error eventually bubbles up to initConnection which fails the connection and is not something we want.

How about the following? cc @nmittler

diff --git a/pilot/pkg/serviceregistry/aggregate/controller.go b/pilot/pkg/serviceregistry/aggregate/controller.go
index 1741a762e..6964364a4 100644
--- a/pilot/pkg/serviceregistry/aggregate/controller.go
+++ b/pilot/pkg/serviceregistry/aggregate/controller.go
@@ -228,6 +228,9 @@ func (c *Controller) GetProxyServiceInstances(node *model.Proxy) ([]*model.Servi
        // It doesn't make sense for a single proxy to be found in more than one registry.
        // TODO: if otherwise, warning or else what to do about it.
        for _, r := range c.GetRegistries() {
+               if r.Cluster() != "" && node.Metadata != nil && node.Metadata.ClusterID != "" && node.Metadata.ClusterID != r.Cluster() {
+                       continue
+               }
                instances, err := r.GetProxyServiceInstances(node)
                if err != nil {
                        errs = multierror.Append(errs, err)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks... I looked right at that code and missed it. I think that makes sense, probably means things up a bit in general

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to filter in the aggregate controller. We need to special case the default cluster which is always name Kubernetes. I started to clean things up so the default registry uses the provided ClusterName but decided defer that for another PR. This is large enough as-is.

node.Metadata.ClusterID != r.Cluster() {
log.Debugf("GetProxyServiceInstances(): not checking cluster %v: proxy %v is associated with cluster %v",
r.Cluster(), node.ID, node.Metadata.ClusterID)
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if r.Cluster == Kubernetes and node.Cluster == MyOtherCluster, this will NOT be skipped? Isn't that what we were trying to avoid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is one of the cases we were trying to avoid. ClusterID is overloaded to mean both the registry type (k8s, consul) and the name (which cluster). I didn't want to fix that issue holistically in this PR (see / #22093). On further thought though we could filter 100% of the cases if we had some special handling above that is less intrusive (see skipSearchingRegistryForProxy above).

instances, err := r.GetProxyServiceInstances(node)
if err != nil {
errs = multierror.Append(errs, err)
} else if len(instances) > 0 {
out = append(out, instances...)
out = instances
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? This means we can only get endpoints from one registry? Which will break all the other registries like ServiceEntry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the for loop comment - "It doesn't make sense for a single proxy to be found in more than one registry". out only ever has at most instances from one registry. The code breaks out of the loop two lines below on line 251.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that comment is accurate? I am surprised all tests pass, I think you should be able to have ServiceEntry + Kube.

@rshriram ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the change from this PR. We can address it in a follow-up.

@istio-testing istio-testing merged commit 9d5e046 into istio:master Mar 11, 2020
@ayj ayj deleted the ws2-multicluster-fixes branch March 11, 2020 23:00
@ayj ayj restored the ws2-multicluster-fixes branch March 12, 2020 05:17
@linsun
Copy link
Member

linsun commented Mar 13, 2020

@ayj any plan to back port this to 1.5?

@ayj ayj deleted the ws2-multicluster-fixes branch March 13, 2020 18:49
@ayj
Copy link
Contributor Author

ayj commented Mar 13, 2020

Yes. PR is in progress. I'm testing the upgrade/downgrade cases. 1.4<=>1.5 with istiodctl and istiod and non-istiod variants are passing. Checking the legacy helm chart case and then I'll send out the PR for review.

@ayj ayj restored the ws2-multicluster-fixes branch March 13, 2020 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants