-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Description
Describe the bug
The Istio Proxy has a very high memory consumption when many services (>100) are configured.
It also grows with the number of services.
And at some point pilot isn't able anymore to push all routes and clusters to the istio-proxy.
Expected behavior
Memory consumption should be independent on the number of services configured in the cluster and only depend on the number of services that are related to the local pod.
Steps to reproduce the bug
Create many ServiceEntries and VirtualServices and check the output of ps -A -o rss,cmd
in the proxy.
Version
Istio 1.0.0
Kubernetes:
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:05:37Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Docker for Mac 18.06.0-ce
Is Istio Auth enabled or not?
No.
Installed with the helm chart:
helm install install/kubernetes/helm/istio --name istio --namespace istio-system
Environment
See above, running on Docker for Mac.
I created one deployment that just sleeps and runs with Istio, which also allows me to log into the container and send curl requests.
Then I created the services with this groovy script:
def s = ""
for (i=0; i < 1000; i++) {
ip = "192.168.${(int)(i / 200)}.${i % 200}"
s += """---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
name: egresstestprovider${i}
spec:
hosts:
- egresstestproviderrp${i}.external
ports:
- number: 80
name: http
protocol: HTTP
resolution: STATIC
location: MESH_EXTERNAL
endpoints:
- address: ${ip}
ports:
http: 8086
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: egresstestservice${i}
spec:
hosts:
- egresstestproviderrp${i}.external
http:
- route:
- destination:
host: egresstestproviderrp${i}.external
"""
}
new File("externals.yaml").text = s
I fetched the proxy config and the Envoy config_dump and Envoy clusters before adding any additional services, after 100, after 500 and after 1000 services.
The rss looked like this after the 4 steps:
RSS CMD
17168 /usr/local/bin/pilot-agent proxy sidecar --configPath /etc/istio/proxy --binaryPath /usr/local/bin/envoy --serviceCluster testcurl --drainDuration 45s --parentShutdownDuration 1m0s --discoveryAddress istio-pilot.istio-system:
38024 /usr/local/bin/envoy -c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster testcurl --service-node sidecar~10.1.7.4~testcurl-8578fbd87-x9gsl.default~default.svc.
RSS CMD
17172 /usr/local/bin/pilot-agent proxy sidecar --configPath /etc/istio/proxy --
46736 /usr/local/bin/envoy -c /etc/istio/proxy/envoy-rev0.json --restart-epoch
RSS CMD
17172 /usr/local/bin/pilot-agent proxy sidecar --configPath /etc/istio/proxy --binaryPath /usr/local/bin/envoy --serviceCluster testcurl --drainDuration 45s --parentShutdownDuration 1m0s --discoveryAddress istio-pilot.istio-system:
96632 /usr/local/bin/envoy -c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster testcurl --service-node sidecar~10.1.7.4~testcurl-8578fbd87-x9gsl.default~default.svc.
RSS CMD
16280 /usr/local/bin/pilot-agent proxy sidecar --configPath /etc/istio/proxy --binaryPath /usr/local/bin/envoy --serviceCluster testcurl --drainDuration 45s --parentShutdownDuration 1m0s --discoveryAddress istio-pilot.istio-system:
133556 /usr/local/bin/envoy -c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster testcurl --service-node sidecar~10.1.7.4~testcurl-8578fbd87-x9gsl.default~default.svc
As you can see the memory utilization for the sidecar goes up to 133MB for a scenario that should be rather common in anything that is bigger than the Bookstore (considering that you have to declare services for every single external dependency)
You can find everything in the attached zip file:
logs.zip
I also have to note that I took the last snapshot when only 78% of the clusters were synched.
Even after waiting for more than 3 hours the proxy-status didn't show more.
And I checked the Envoy config directly that nothing more was pushed down.