Skip to content

High memory utilization of istio-proxy proportional to number of services #7912

@robertpanzer

Description

@robertpanzer

Describe the bug
The Istio Proxy has a very high memory consumption when many services (>100) are configured.
It also grows with the number of services.
And at some point pilot isn't able anymore to push all routes and clusters to the istio-proxy.

Expected behavior
Memory consumption should be independent on the number of services configured in the cluster and only depend on the number of services that are related to the local pod.

Steps to reproduce the bug
Create many ServiceEntries and VirtualServices and check the output of ps -A -o rss,cmd in the proxy.

Version
Istio 1.0.0
Kubernetes:
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:05:37Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Docker for Mac 18.06.0-ce

Is Istio Auth enabled or not?
No.
Installed with the helm chart:

helm install install/kubernetes/helm/istio --name istio --namespace istio-system

Environment
See above, running on Docker for Mac.

I created one deployment that just sleeps and runs with Istio, which also allows me to log into the container and send curl requests.

Then I created the services with this groovy script:

def s = ""
for (i=0; i < 1000; i++) {
    ip = "192.168.${(int)(i / 200)}.${i % 200}"
s += """---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: egresstestprovider${i}
spec:
  hosts:
  - egresstestproviderrp${i}.external
  ports:
  - number: 80
    name: http
    protocol: HTTP
  resolution: STATIC
  location: MESH_EXTERNAL
  endpoints:
  - address: ${ip}
    ports:
      http: 8086
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: egresstestservice${i}
spec:
  hosts:
  - egresstestproviderrp${i}.external
  http:
  - route:
    - destination:
        host: egresstestproviderrp${i}.external
"""

}
new File("externals.yaml").text = s

I fetched the proxy config and the Envoy config_dump and Envoy clusters before adding any additional services, after 100, after 500 and after 1000 services.
The rss looked like this after the 4 steps:

  RSS CMD
17168 /usr/local/bin/pilot-agent proxy sidecar --configPath /etc/istio/proxy --binaryPath /usr/local/bin/envoy --serviceCluster testcurl --drainDuration 45s --parentShutdownDuration 1m0s --discoveryAddress istio-pilot.istio-system:
38024 /usr/local/bin/envoy -c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster testcurl --service-node sidecar~10.1.7.4~testcurl-8578fbd87-x9gsl.default~default.svc.
 
  RSS CMD
17172 /usr/local/bin/pilot-agent proxy sidecar --configPath /etc/istio/proxy --
46736 /usr/local/bin/envoy -c /etc/istio/proxy/envoy-rev0.json --restart-epoch
 
  RSS CMD
17172 /usr/local/bin/pilot-agent proxy sidecar --configPath /etc/istio/proxy --binaryPath /usr/local/bin/envoy --serviceCluster testcurl --drainDuration 45s --parentShutdownDuration 1m0s --discoveryAddress istio-pilot.istio-system:
96632 /usr/local/bin/envoy -c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster testcurl --service-node sidecar~10.1.7.4~testcurl-8578fbd87-x9gsl.default~default.svc.
 
  RSS CMD
16280 /usr/local/bin/pilot-agent proxy sidecar --configPath /etc/istio/proxy --binaryPath /usr/local/bin/envoy --serviceCluster testcurl --drainDuration 45s --parentShutdownDuration 1m0s --discoveryAddress istio-pilot.istio-system:
133556 /usr/local/bin/envoy -c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster testcurl --service-node sidecar~10.1.7.4~testcurl-8578fbd87-x9gsl.default~default.svc

As you can see the memory utilization for the sidecar goes up to 133MB for a scenario that should be rather common in anything that is bigger than the Bookstore (considering that you have to declare services for every single external dependency)

You can find everything in the attached zip file:
logs.zip

I also have to note that I took the last snapshot when only 78% of the clusters were synched.
Even after waiting for more than 3 hours the proxy-status didn't show more.
And I checked the Envoy config directly that nothing more was pushed down.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions