Sidecar is slow to startup when GCP is not accessible

The below method is querying metadata from GCP. However, when GCP is not accessible (e.g. firewall rules), this will delay the agent to start up for about 90 seconds. 

https://github.com/istio/istio/blob/6450eccc71f01448d2ff740d1011c2358dbe0a28/pkg/bootstrap/platform/gcp.go#L137

We discovered this when `holdApplicationUntilProxyStarts` is set to  `true`. It caused the pod to be in a infinite crash loop, as the pilot-wait health checks were constantly failed (the health check timeouts in 60 seconds).


How to reproduce:

1. In GKE (with Kubernetes Network Policies enbaled) install istio
2. Create test ns and network policy to block traffic to GCP:
```
kubectl create ns test                                      
kubectl label namespace test istio-injection=enabled 

kubectl -n test apply -f- <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test-network-policy
  namespace: test
spec:
  podSelector:
    matchLabels: 
      app: demo
  policyTypes:
    - Egress
  egress:
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
            except: 
            - 169.254.169.254/32
        - ipBlock:
            cidr: 10.0.0.0/8
EOF
```

3. Create the following workload
```
kubectl -n test apply -f- <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name:  netshoot
  labels:
    app: demo
spec:
  selector:
    matchLabels:
      app: demo
  replicas: 1
  template:
    metadata:
      annotations:
        sidecar.istio.io/agentLogLevel: "all:debug"
      labels:
        app: demo
    spec:
      containers:
      - name:  netshoot
        image:  nicolaka/netshoot
        imagePullPolicy: IfNotPresent
        args: ['tail', '-f', '/dev/null']
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 100m
            memory: 100Mi
EOF
```

The sidecar will take about 2 minutes to start up. If you would annotate it to hold until app starts it would fail. 
To do so add `proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }'` to the above annotations.

> NOTE: Errors are being hidden, I created a second container with some logs prefixed with a series of = signs. You might want to use that to see all the errors: `sidecar.istio.io/proxyImage: 'rinormaloku/proxyv2:wd8'`

I can make a PR to fix this issue but wanted to pick up your mind, what you'd recommend.

**Question:** Should we error out on failure? Shall we make the requests concurrently? Shall we skip all subsequent requests when we discover that GCP is not accessible? Or shall we simply increase the pilot-agent wait timeout? 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sidecar is slow to startup when GCP is not accessible #39950

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sidecar is slow to startup when GCP is not accessible #39950

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions