-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Description
Bug Description
I have a ServiceEntry that references a service in the cluster as well as a workloadentry referencing an endpoint in another cluster.
# Service entry using proxy dns endpoint
- apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
creationTimestamp: "2022-06-16T17:24:06Z"
generation: 1
name: vd-frontend-web-ui-team-solo-io-mesh-web-team
namespace: web-ui
resourceVersion: "8843"
uid: 4dadd02d-271e-4f61-9fed-f0afbfd3f80f
spec:
addresses:
- 241.215.41.254
exportTo:
- istio-gateways
- web-ui
hosts:
- frontend.web-ui-team.solo-io.mesh
location: MESH_INTERNAL
ports:
- name: grpc-80
number: 80
protocol: GRPC
targetPort: 8080
resolution: DNS
workloadSelector:
labels:
app: frontend
---
# workload entry to remote instance
apiVersion: v1
items:
- apiVersion: networking.istio.io/v1beta1
kind: WorkloadEntry
metadata:
creationTimestamp: "2022-06-16T17:24:06Z"
generation: 1
name: vd-frontend-web-ui-team-solo-io-5915d18f8e034a159824655779d1f69
namespace: web-ui
resourceVersion: "8842"
uid: 93bbc2f5-84b8-40fc-b989-a5116e7530f3
spec:
address: a7038f5d58ee4414ba64937b906e9af3-2035711217.us-east-2.elb.amazonaws.com
labels:
app: frontend
locality: us-east-2
ports:
grpc-80: 15443
kind: List
metadata:
resourceVersion: ""
selfLink: ""
---
# service definition for the pod in the local cluster
+ kubectl get svc -n web-ui frontend -o yaml
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2022-06-16T17:19:40Z"
labels:
app: frontend
name: frontend
namespace: web-ui
resourceVersion: "7354"
uid: 7f4b269c-4384-4a2d-a6ad-eac02c78e548
spec:
clusterIP: 10.100.112.57
clusterIPs:
- 10.100.112.57
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: http
port: 80
protocol: TCP
targetPort: 8080
selector:
app: frontend
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
frontend pod running in the web-ui namespace is patched with the following command.
The patch just causes the pod to redeploy and be unhealthy (trigger a failover)
kubectl --context $CLUSTER1 -n web-ui patch deploy frontend --patch '{"spec":{"template":{"spec":{"containers":[{"name":"server","command":["sleep","20h"],"readinessProbe":null,"livenessProbe":null}]}}}}'
However EDS does not get updated so the endpoints in the ServiceEntry are retaining the old pod IP address. This causes envoy to get a bunch of connection failures like shown below.
outbound|80||frontend.web-ui-team.solo-io.mesh::observability_name::outbound|80||frontend.web-ui-team.solo-io.mesh
outbound|80||frontend.web-ui-team.solo-io.mesh::outlier::success_rate_average::-1
outbound|80||frontend.web-ui-team.solo-io.mesh::outlier::success_rate_ejection_threshold::-1
outbound|80||frontend.web-ui-team.solo-io.mesh::outlier::local_origin_success_rate_average::-1
outbound|80||frontend.web-ui-team.solo-io.mesh::outlier::local_origin_success_rate_ejection_threshold::-1
outbound|80||frontend.web-ui-team.solo-io.mesh::default_priority::max_connections::4294967295
outbound|80||frontend.web-ui-team.solo-io.mesh::default_priority::max_pending_requests::4294967295
outbound|80||frontend.web-ui-team.solo-io.mesh::default_priority::max_requests::4294967295
outbound|80||frontend.web-ui-team.solo-io.mesh::default_priority::max_retries::4294967295
outbound|80||frontend.web-ui-team.solo-io.mesh::high_priority::max_connections::1024
outbound|80||frontend.web-ui-team.solo-io.mesh::high_priority::max_pending_requests::1024
outbound|80||frontend.web-ui-team.solo-io.mesh::high_priority::max_requests::1024
outbound|80||frontend.web-ui-team.solo-io.mesh::high_priority::max_retries::3
outbound|80||frontend.web-ui-team.solo-io.mesh::added_via_api::true
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::cx_active::0
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::cx_connect_fail::10
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::cx_total::10
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::rq_active::0
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::rq_error::10
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::rq_success::0
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::rq_timeout::0
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::rq_total::0
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::hostname::192.168.185.62
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::health_flags::/failed_outlier_check
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::weight::1
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::region::us-west-2
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::zone::us-west-2b
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::sub_zone::
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::canary::false
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::priority::0
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::success_rate::-1.0
outbound|80||frontend.web-ui-team.solo-io.mesh::192.168.185.62:8080::local_origin_success_rate::-1.0
however if you delete
the pod instead of patch, istiod will trigger the correct EDS update.
Version
▶ istioctl version
client version: 1.13.4
control plane version: 1.13.4
data plane version: 1.13.4 (11 proxies)
Additional Information
I am attaching a bug-report with the EDS problem currently happening
istioctl pc cluster istio-ingressgateway-5554bbb688-5jgxx.istio-gateways --fqdn "frontend.web-ui-team.solo-io.mesh" -o yaml > /tmp/clusters.yaml
- circuitBreakers:
thresholds:
- maxConnections: 4294967295
maxPendingRequests: 4294967295
maxRequests: 4294967295
maxRetries: 4294967295
trackRemaining: true
commonLbConfig:
healthyPanicThreshold: {}
localityWeightedLbConfig: {}
connectTimeout: 10s
dnsLookupFamily: V4_ONLY
dnsRefreshRate: 5s
filters:
- name: istio.metadata_exchange
typedConfig:
'@type': type.googleapis.com/envoy.tcp.metadataexchange.config.MetadataExchange
protocol: istio-peer-exchange
loadAssignment:
clusterName: outbound|80||frontend.web-ui-team.solo-io.mesh
endpoints:
- lbEndpoints:
- endpoint:
address:
socketAddress:
address: a7038f5d58ee4414ba64937b906e9af3-2035711217.us-east-2.elb.amazonaws.com
portValue: 15443
loadBalancingWeight: 1
metadata:
filterMetadata:
istio:
workload: vd-frontend-web-ui-team-solo-io-5915d18f8e034a159824655779d1f69;web-ui;;;cluster1
loadBalancingWeight: 1
locality:
region: us-east-2
priority: 1
- lbEndpoints:
- endpoint:
address:
socketAddress:
address: 192.168.121.138
portValue: 8080
loadBalancingWeight: 1
metadata:
filterMetadata:
envoy.transport_socket_match:
tlsMode: istio
istio:
workload: frontend;web-ui;frontend;latest;cluster1
loadBalancingWeight: 1
locality:
region: us-west-2
zone: us-west-2d
metadata:
filterMetadata:
istio:
config: /apis/networking.istio.io/v1alpha3/namespaces/web-ui/destination-rule/frontend-web-ui-team-solo-io-me-b10cdaefe3b65699ea8617c65bb172c
default_original_port: 80
services:
- host: frontend.web-ui-team.solo-io.mesh
name: frontend.web-ui-team.solo-io.mesh
namespace: web-ui
name: outbound|80||frontend.web-ui-team.solo-io.mesh
outlierDetection:
baseEjectionTime: 15s
consecutive5xx: 2
enforcingConsecutive5xx: 100
enforcingSuccessRate: 0
interval: 5s
maxEjectionPercent: 100
respectDnsTtl: true
transportSocket:
name: envoy.transport_sockets.tls
typedConfig:
'@type': type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
commonTlsContext:
alpnProtocols:
- istio-peer-exchange
- istio
- h2
combinedValidationContext:
defaultValidationContext:
matchSubjectAltNames:
- exact: spiffe://cluster1.solo.io/ns/web-ui/sa/frontend
- exact: spiffe://cluster2.solo.io/ns/web-ui/sa/frontend
validationContextSdsSecretConfig:
name: ROOTCA
sdsConfig:
apiConfigSource:
apiType: GRPC
grpcServices:
- envoyGrpc:
clusterName: sds-grpc
setNodeOnFirstMessageOnly: true
transportApiVersion: V3
initialFetchTimeout: 0s
resourceApiVersion: V3
tlsCertificateSdsSecretConfigs:
- name: default
sdsConfig:
apiConfigSource:
apiType: GRPC
grpcServices:
- envoyGrpc:
clusterName: sds-grpc
setNodeOnFirstMessageOnly: true
transportApiVersion: V3
initialFetchTimeout: 0s
resourceApiVersion: V3
sni: outbound_.80_._.frontend.web-ui-team.solo-io.mesh
type: STRICT_DNS
typedExtensionProtocolOptions:
envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
'@type': type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
explicitHttpConfig:
http2ProtocolOptions:
maxConcurrentStreams: 1073741824