-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Description
Describe the bug
Enabling orphanedResources
monitoring for a project with all cluster resources whitelisted leads to excessive growth in traffic and resource usage for etcd, as well as for ArgoCD itself.
For etcd we have observed:
- Traffic into and out of etcd increasing ~10x
- Database size increasing ~3x
- Memory usage increasing ~4x
And for ArgoCD:
- Reconcilliation activity increasing ~6x
- Application controller CPU usage increasing ~5x
- Cluster events almost doubling
The same can not be repoduced for a project with no cluster resources whitelisted (clusterResourceWhitelist: []
).
To Reproduce
We have a single application that applies all of our cluster scoped resources:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: kube-system
spec:
destination:
namespace: kube-system
server: https://kubernetes.default.svc
project: cluster
source:
path: cluster-name/kube-system
repoURL: git@github.com:example/example-manifests.git
targetRevision: master
ignoreDifferences:
- group: apiextensions.k8s.io
kind: CustomResourceDefinition
jsonPointers:
- /status
syncPolicy:
automated:
selfHeal: true
prune: true
This application belongs to a project that whitelists all cluster resources
kind: AppProject
metadata:
name: cluster
spec:
description: Project for applying cluster resources
clusterResourceWhitelist:
- group: "*"
kind: "*"
sourceRepos:
- "*"
destinations:
- server: https://kubernetes.default.svc
namespace: "*"
The application in question manages 1259 resource objects (as reported by argocd_cluster_api_resource_objects
) and the cluster holds 80 resource types altogether (argocd_cluster_api_resources
).
Enabling orphanedResources
on the project immediately results in the described uptick in resource usage, with memory usage and db size seeming to plateau after several hours.
Refer to the screenshots below.
Expected behavior
I would not expect this to be so heavily resource intensive for etcd, or for ArgoCD, given how negligible it is for a project without any cluster resources whitelisted.
Screenshots
In these screenshots you can see when orphanedResources
was enabled at 10:30 on 7/7 and then disabled at ~11:00 on 7/9.
Cluster events as reported by ArgoCD:
Version
argocd: v1.5.4+36bade7
BuildDate: 2020-05-05T19:02:56Z
GitCommit: 36bade7a2d7b69d1c0b0c4d41191f792a847d61c
GitTreeState: clean
GoVersion: go1.14.1
Compiler: gc
Platform: darwin/amd64
argocd-server: v1.6.1+159674e
BuildDate: 2020-06-19T00:41:05Z
GitCommit: 159674ee844a378fb98fe297006bf7b83a6e32d2
GitTreeState: clean
GoVersion: go1.14.1
Compiler: gc
Platform: linux/amd64
Ksonnet Version: v0.13.1
Kustomize Version: {Version:kustomize/v3.6.1 GitCommit:c97fa946d576eb6ed559f17f2ac43b3b5a8d5dbd BuildDate:2020-05-27T20:47:35Z GoOs:linux GoArch:amd64}
Helm Version: version.BuildInfo{Version:"v3.2.0", GitCommit:"e11b7ce3b12db2941e90399e874513fbd24bcb71", GitTreeState:"clean", GoVersion:"go1.13.10"}
Kubectl Version: v1.14.0
Additional notes
- After disabling
orphanedResources
we've found we've needed to compact and defrag etcd to bring memory usage and db size back down to normal levels (https://www.compose.com/articles/how-to-keep-your-etcd-lean-and-mean/) - This could be related to Installing argocd causes unbounded etcd memory usage #3556. Although as you can see from the screenshots, in our case etcd memory seems to plateau after a sharp rise rather than continue to grow unbounded.