Skip to content

istiod ha high cpu #40580

@jduepmeier

Description

@jduepmeier

Bug Description

After some time some of the istiod pods (not all) will consume all cpu of a node. HPA is enabled so new pods will spawn until the limit is reached but the old pods will never recover.

Deployment is via default helm chart.

Version

$ istioctl version
client version: 1.14.3
control plane version: 1.14.3
data plane version: 1.14.3 (26 proxies)

$ kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.24.2
Kustomize Version: v4.5.4
Server Version: v1.21.6+vmware.1
WARNING: version difference between client (1.24) and server (1.21) exceeds the supported minor version skew of +/-1

$ helm version --short
v3.9.3+g414ff28

$ helm list
istio-base          	istio-system  	1       	2022-07-22 11:53:26.284845268 +0200 CEST	deployed	base-1.14.1                   	1.14.1 (no changes between 1.14.1 and 1.14.3, crds are applied from crd-all.gen.yaml and crd-operator.yaml files for the version)
istio-cni           	kube-system   	2       	2022-08-05 10:21:37.924369286 +0200 CEST	deployed	cni-1.14.3                    	1.14.3
istio-egressgateway 	istio-gateways	3       	2022-08-05 10:36:19.886377351 +0200 CEST	deployed	gateway-1.14.3                	1.14.3
istio-ingressgateway	istio-gateways	3       	2022-08-05 10:36:19.889675063 +0200 CEST	deployed	gateway-1.14.3                	1.14.3
istiod              	istio-system  	3       	2022-08-05 10:36:09.368492077 +0200 CEST	deployed	istiod-1.14.3

Additional Information

I created the pprof reports (https://github.com/istio/istio/wiki/Analyzing-Istio-Performance) from one of the affected pods.

goroutines.txt
pprof.pilot-discovery.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
pprof.pilot-discovery.samples.cpu.001.pb.gz

Affected product area

  • Docs
  • Installation
  • Networking
  • Performance and Scalability
  • Extensions and Telemetry
  • Security
  • Test and Release
  • User Experience
  • Developer Infrastructure
  • Upgrade
  • Multi Cluster
  • Virtual Machine
  • Control Plane Revisions

Is this the right place to submit this?

  • This is not a security vulnerability
  • This is not a question about how to use Istio

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions