-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Description
Checklist:
- I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
- I've included steps to reproduce the bug.
- I've pasted the output of
argocd version
.
Describe the bug
I've got a set of very large Kubernetes clusters, but a fairly low number of ArgoCD Applications.
However, some of those applications are tracking daemonsets like Prometheus-operator where we could easily have 1000s of resources tracked within the app that scale extremely dynamically (adding or removing 100s of resources at a time)
Constantly, these larger applications will become stale/out of sync with the cluster. I'll open the app, it'll have old pods that no longer exist within the cluster. The only way I've found to clear this is to restart the argo-application-controllers. Performing a hard refresh on the app also does not clear these old resources.
Are there any performance recommendations for when you have very large single applications that are very dynamic? The HA docs
don't go into a ton of detail about what exactly each option effects, though the overall gist of the doc is you need to tune these when you have a large number of applications, nothing really speaking to when you have very large applications.
I haven't really found any error codes that stand out either from the Application Controller beyond this somewhat often:
To Reproduce
Stand up an application with 1000s of resources tracked and introduce a log of churn
Expected behavior
The Application Controller does not hold on to stale resources
Version
argocd: v2.1.5+a8a6fc8
BuildDate: 2021-10-20T15:16:40Z
GitCommit: a8a6fc8dda0e26bb1e0b893e270c1128038f5b0f
GitTreeState: clean
GoVersion: go1.16.5
Compiler: gc
Platform: linux/amd64
argocd-server: v2.1.5+a8a6fc8
BuildDate: 2021-10-20T15:16:40Z
GitCommit: a8a6fc8dda0e26bb1e0b893e270c1128038f5b0f
GitTreeState: clean
GoVersion: go1.16.5
Compiler: gc
Platform: linux/amd64
Ksonnet Version: v0.13.1
Kustomize Version: v4.2.0 2021-06-30T22:49:26Z
Helm Version: v3.6.0+g7f2df64
Kubectl Version: v0.21.0
Jsonnet Version: v0.17.0```
We're also following most of the HA recommendations, including many of them around dealing with monorepos (using the annotations and webhook, multiple repo-server pods). Further we have plenty of headroom on our resource usage for all components.
argocd-application-controller:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: argocd-application-controller
spec:
# Replicas and ARGOCD_CONTROLLER_REPLICAS env var need to match
replicas: 8
template:
spec:
containers:
- name: argocd-application-controller
command:
- argocd-application-controller
- --status-processors
- "200"
- --operation-processors
- "100"
- --repo-server-timeout-seconds
- "180"
- --redis
- "argocd-redis-ha-haproxy:6379"
env:
- name: ARGOCD_CONTROLLER_REPLICAS
value: '8'
resources:
requests:
cpu: 4
memory: 6Gi
Logs
Failed to get cached managed resources for tree reconciliation, fall back to full reconciliation