Large Dynamic Applications resulting in stale resource state

Checklist:

* [X] I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
* [X] I've included steps to reproduce the bug.
* [X] I've pasted the output of `argocd version`.

**Describe the bug**

I've got a set of very large Kubernetes clusters, but a fairly low number of ArgoCD Applications.
However, some of those applications are tracking daemonsets like Prometheus-operator where we could easily have 1000s of resources tracked within the app that scale extremely dynamically (adding or removing 100s of resources at a time)

Constantly, these larger applications will become stale/out of sync with the cluster. I'll open the app, it'll have old pods that no longer exist within the cluster. The only way I've found to clear this is to restart the argo-application-controllers. Performing a hard refresh on the app also does not clear these old resources.

Are there any performance recommendations for when you have very large single applications that are very dynamic? The HA docs 
don't go into a ton of detail about what exactly each option effects, though the overall gist of the doc is you need to tune these when you have a large number of applications, nothing really speaking to when you have very large applications.

I haven't really found any error codes that stand out either from the Application Controller beyond this somewhat often:

**To Reproduce**

Stand up an application with 1000s of resources tracked and introduce a log of churn

**Expected behavior**

The Application Controller does not hold on to stale resources

**Version**

```shell
argocd: v2.1.5+a8a6fc8
  BuildDate: 2021-10-20T15:16:40Z
  GitCommit: a8a6fc8dda0e26bb1e0b893e270c1128038f5b0f
  GitTreeState: clean
  GoVersion: go1.16.5
  Compiler: gc
  Platform: linux/amd64
argocd-server: v2.1.5+a8a6fc8
  BuildDate: 2021-10-20T15:16:40Z
  GitCommit: a8a6fc8dda0e26bb1e0b893e270c1128038f5b0f
  GitTreeState: clean
  GoVersion: go1.16.5
  Compiler: gc
  Platform: linux/amd64
  Ksonnet Version: v0.13.1
  Kustomize Version: v4.2.0 2021-06-30T22:49:26Z
  Helm Version: v3.6.0+g7f2df64
  Kubectl Version: v0.21.0
  Jsonnet Version: v0.17.0```
```

We're also following most of the HA recommendations, including many of them around dealing with monorepos (using the annotations and webhook, multiple repo-server pods). Further we have plenty of headroom on our resource usage for all components.

argocd-application-controller:

```yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: argocd-application-controller
spec:
  # Replicas and ARGOCD_CONTROLLER_REPLICAS env var need to match
  replicas: 8
  template:
    spec:
      containers:
      - name: argocd-application-controller
        command:
        - argocd-application-controller
        - --status-processors
        - "200"
        - --operation-processors
        - "100"
        - --repo-server-timeout-seconds
        - "180"
        - --redis
        - "argocd-redis-ha-haproxy:6379"
        env:
          - name: ARGOCD_CONTROLLER_REPLICAS
            value: '8'
        resources:
          requests:
            cpu: 4
            memory: 6Gi
```

**Logs**

```log
Failed to get cached managed resources for tree reconciliation, fall back to full reconciliation
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Large Dynamic Applications resulting in stale resource state #8175

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Large Dynamic Applications resulting in stale resource state #8175

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions