Skip to content

ServiceEntry EDS fails to update when updating a deployment #39505

@nmnellis

Description

@nmnellis

Bug Description

I have a ServiceEntry that references a service in the cluster as well as a workloadentry referencing an endpoint in another cluster.

# Service entry using proxy dns endpoint
- apiVersion: networking.istio.io/v1beta1
  kind: ServiceEntry
  metadata:
    creationTimestamp: "2022-06-16T17:24:06Z"
    generation: 1
    name: vd-frontend-web-ui-team-solo-io-mesh-web-team
    namespace: web-ui
    resourceVersion: "8843"
    uid: 4dadd02d-271e-4f61-9fed-f0afbfd3f80f
  spec:
    addresses:
    - 241.215.41.254
    exportTo:
    - istio-gateways
    - web-ui
    hosts:
    - frontend.web-ui-team.solo-io.mesh
    location: MESH_INTERNAL
    ports:
    - name: grpc-80
      number: 80
      protocol: GRPC
      targetPort: 8080
    resolution: DNS
    workloadSelector:
      labels:
        app: frontend
---
# workload entry to remote instance
apiVersion: v1
items:
- apiVersion: networking.istio.io/v1beta1
  kind: WorkloadEntry
  metadata:
    creationTimestamp: "2022-06-16T17:24:06Z"
    generation: 1
    name: vd-frontend-web-ui-team-solo-io-5915d18f8e034a159824655779d1f69
    namespace: web-ui
    resourceVersion: "8842"
    uid: 93bbc2f5-84b8-40fc-b989-a5116e7530f3
  spec:
    address: a7038f5d58ee4414ba64937b906e9af3-2035711217.us-east-2.elb.amazonaws.com
    labels:
      app: frontend
    locality: us-east-2
    ports:
      grpc-80: 15443
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
---
# service definition for the pod in the local cluster
+ kubectl get svc -n web-ui frontend -o yaml
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2022-06-16T17:19:40Z"
  labels:
    app: frontend
  name: frontend
  namespace: web-ui
  resourceVersion: "7354"
  uid: 7f4b269c-4384-4a2d-a6ad-eac02c78e548
spec:
  clusterIP: 10.100.112.57
  clusterIPs:
  - 10.100.112.57
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: frontend
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

frontend pod running in the web-ui namespace is patched with the following command.

The patch just causes the pod to redeploy and be unhealthy (trigger a failover)

kubectl --context $CLUSTER1 -n web-ui patch deploy frontend --patch '{"spec":{"template":{"spec":{"containers":[{"name":"server","command":["sleep","20h"],"readinessProbe":null,"livenessProbe":null}]}}}}'

However EDS does not get updated so the endpoints in the ServiceEntry are retaining the old pod IP address. This causes envoy to get a bunch of connection failures like shown below.

outbound|80||frontend.web-ui-team.solo-io.mesh::observability_name::outbound|80||frontend.web-ui-team.solo-io.mesh
outbound|80||frontend.web-ui-team.solo-io.mesh::outlier::success_rate_average::-1
outbound|80||frontend.web-ui-team.solo-io.mesh::outlier::success_rate_ejection_threshold::-1
outbound|80||frontend.web-ui-team.solo-io.mesh::outlier::local_origin_success_rate_average::-1
outbound|80||frontend.web-ui-team.solo-io.mesh::outlier::local_origin_success_rate_ejection_threshold::-1
outbound|80||frontend.web-ui-team.solo-io.mesh::default_priority::max_connections::4294967295
outbound|80||frontend.web-ui-team.solo-io.mesh::default_priority::max_pending_requests::4294967295
outbound|80||frontend.web-ui-team.solo-io.mesh::default_priority::max_requests::4294967295
outbound|80||frontend.web-ui-team.solo-io.mesh::default_priority::max_retries::4294967295
outbound|80||frontend.web-ui-team.solo-io.mesh::high_priority::max_connections::1024
outbound|80||frontend.web-ui-team.solo-io.mesh::high_priority::max_pending_requests::1024
outbound|80||frontend.web-ui-team.solo-io.mesh::high_priority::max_requests::1024
outbound|80||frontend.web-ui-team.solo-io.mesh::high_priority::max_retries::3
outbound|80||frontend.web-ui-team.solo-io.mesh::added_via_api::true
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::cx_active::0
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::cx_connect_fail::10
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::cx_total::10
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::rq_active::0
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::rq_error::10
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::rq_success::0
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::rq_timeout::0
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::rq_total::0
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::hostname::192.168.185.62
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::health_flags::/failed_outlier_check
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::weight::1
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::region::us-west-2
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::zone::us-west-2b
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::sub_zone::
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::canary::false
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::priority::0
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::success_rate::-1.0
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::local_origin_success_rate::-1.0

however if you delete the pod instead of patch, istiod will trigger the correct EDS update.

Version

▶ istioctl version                                                                                                                                
client version: 1.13.4
control plane version: 1.13.4
data plane version: 1.13.4 (11 proxies)

Additional Information

I am attaching a bug-report with the EDS problem currently happening

istioctl pc cluster istio-ingressgateway-5554bbb688-5jgxx.istio-gateways --fqdn "frontend.web-ui-team.solo-io.mesh" -o yaml > /tmp/clusters.yaml

- circuitBreakers:
    thresholds:
    - maxConnections: 4294967295
      maxPendingRequests: 4294967295
      maxRequests: 4294967295
      maxRetries: 4294967295
      trackRemaining: true
  commonLbConfig:
    healthyPanicThreshold: {}
    localityWeightedLbConfig: {}
  connectTimeout: 10s
  dnsLookupFamily: V4_ONLY
  dnsRefreshRate: 5s
  filters:
  - name: istio.metadata_exchange
    typedConfig:
      '@type': type.googleapis.com/envoy.tcp.metadataexchange.config.MetadataExchange
      protocol: istio-peer-exchange
  loadAssignment:
    clusterName: outbound|80||frontend.web-ui-team.solo-io.mesh
    endpoints:
    - lbEndpoints:
      - endpoint:
          address:
            socketAddress:
              address: a7038f5d58ee4414ba64937b906e9af3-2035711217.us-east-2.elb.amazonaws.com
              portValue: 15443
        loadBalancingWeight: 1
        metadata:
          filterMetadata:
            istio:
              workload: vd-frontend-web-ui-team-solo-io-5915d18f8e034a159824655779d1f69;web-ui;;;cluster1
      loadBalancingWeight: 1
      locality:
        region: us-east-2
      priority: 1
    - lbEndpoints:
      - endpoint:
          address:
            socketAddress:
              address: 192.168.121.138
              portValue: 8080
        loadBalancingWeight: 1
        metadata:
          filterMetadata:
            envoy.transport_socket_match:
              tlsMode: istio
            istio:
              workload: frontend;web-ui;frontend;latest;cluster1
      loadBalancingWeight: 1
      locality:
        region: us-west-2
        zone: us-west-2d
  metadata:
    filterMetadata:
      istio:
        config: /apis/networking.istio.io/v1alpha3/namespaces/web-ui/destination-rule/frontend-web-ui-team-solo-io-me-b10cdaefe3b65699ea8617c65bb172c
        default_original_port: 80
        services:
        - host: frontend.web-ui-team.solo-io.mesh
          name: frontend.web-ui-team.solo-io.mesh
          namespace: web-ui
  name: outbound|80||frontend.web-ui-team.solo-io.mesh
  outlierDetection:
    baseEjectionTime: 15s
    consecutive5xx: 2
    enforcingConsecutive5xx: 100
    enforcingSuccessRate: 0
    interval: 5s
    maxEjectionPercent: 100
  respectDnsTtl: true
  transportSocket:
    name: envoy.transport_sockets.tls
    typedConfig:
      '@type': type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
      commonTlsContext:
        alpnProtocols:
        - istio-peer-exchange
        - istio
        - h2
        combinedValidationContext:
          defaultValidationContext:
            matchSubjectAltNames:
            - exact: spiffe://cluster1.solo.io/ns/web-ui/sa/frontend
            - exact: spiffe://cluster2.solo.io/ns/web-ui/sa/frontend
          validationContextSdsSecretConfig:
            name: ROOTCA
            sdsConfig:
              apiConfigSource:
                apiType: GRPC
                grpcServices:
                - envoyGrpc:
                    clusterName: sds-grpc
                setNodeOnFirstMessageOnly: true
                transportApiVersion: V3
              initialFetchTimeout: 0s
              resourceApiVersion: V3
        tlsCertificateSdsSecretConfigs:
        - name: default
          sdsConfig:
            apiConfigSource:
              apiType: GRPC
              grpcServices:
              - envoyGrpc:
                  clusterName: sds-grpc
              setNodeOnFirstMessageOnly: true
              transportApiVersion: V3
            initialFetchTimeout: 0s
            resourceApiVersion: V3
      sni: outbound_.80_._.frontend.web-ui-team.solo-io.mesh
  type: STRICT_DNS
  typedExtensionProtocolOptions:
    envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
      '@type': type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
      explicitHttpConfig:
        http2ProtocolOptions:
          maxConcurrentStreams: 1073741824


bug-report.tar.gz

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions