Skip to content

[argo-cd] Enabling ha with autoscaling results in redis-ha-haproxy crashing with OOMKilled #1958

@lknite

Description

@lknite

Describe the bug

Set values.yaml as described to enable ha with autoscaling here:
https://github.com/argoproj/argo-helm/tree/main/charts/argo-cd

The redis-ha-haproxy pods are crashing with OOMKilled.

AME                                                    READY   STATUS             RESTARTS        AGE
pod/argocd-application-controller-0                     1/1     Running            0               14m
pod/argocd-applicationset-controller-5797cd75dd-dbwcm   1/1     Running            0               12m
pod/argocd-applicationset-controller-5797cd75dd-m5r2t   1/1     Running            0               14m
pod/argocd-notifications-controller-5fc57946c7-fjvpx    1/1     Running            0               12m
pod/argocd-redis-ha-haproxy-d67fc9b6-gsddn              0/1     CrashLoopBackOff   7 (4m40s ago)   12m
pod/argocd-redis-ha-haproxy-d67fc9b6-gttv9              0/1     CrashLoopBackOff   9 (112s ago)    17m
pod/argocd-redis-ha-haproxy-d67fc9b6-vdgrw              0/1     CrashLoopBackOff   9 (18s ago)     14m
pod/argocd-redis-ha-server-0                            3/3     Running            0               11m
pod/argocd-redis-ha-server-1                            3/3     Running            0               16m
pod/argocd-redis-ha-server-2                            3/3     Running            0               13m
pod/argocd-repo-server-59d6fd5d45-f2fnp                 1/1     Running            0               12m
pod/argocd-repo-server-59d6fd5d45-vnwc7                 1/1     Running            0               14m
pod/argocd-server-67cfc6c877-rnkrf                      1/1     Running            0               12m
pod/argocd-server-67cfc6c877-zdns4                      1/1     Running            0               14m

Related helm chart

argo-cd

Helm chart version

5.28.2

To Reproduce

argo-cd:

  redis-ha:
    enabled: true

  controller:
    replicas: 1

    metrics:
      enabled: true

  dex:
    enabled: false

  server:
    autoscaling:
      enabled: true
      minReplicas: 2

    extraArgs:
    - --insecure

    ingress:
      enabled: true
      ingressClassName: nginx
      hosts:
      - argocd.k.home.net
      tls:
      - secretName: argocd.k.home.net-tls
        hosts:
        - argocd.k.home.net
      annotations:
        cert-manager.io/issuer: "cluster-adcs-issuer"                   #use specific name of issuer
        cert-manager.io/issuer-kind: "ClusterAdcsIssuer"                #or ClusterAdcsIssuer
        cert-manager.io/issuer-group: "adcs.certmanager.csf.nokia.com"
        nginx.ingress.kubernetes.io/rewrite-target: /
        nginx.ingress.kubernetes.io/proxy-body-size: 1000m
        nginx.ingress.kubernetes.io/proxy-buffer-size: 16k

    volumeMounts:
    - mountPath: "/etc/ssl/certs"
      name: ca-bundle
    volumes:
    - name: ca-bundle
      secret:
        secretName: ca-bundle

    config:
      url: "https://argocd.k.home.net"
      oidc.config: |
        name: Azure
        issuer: https://login.microsoftonline.com/<snip>/v2.0
        clientID: <snip>
        clientSecret: <snip>
        requestedIDTokenClaims:
          groups:
            essential: true
        requestedScopes:
        - openid
        - profile
        - email
        - offline_access

    rbacConfig:
      policy.csv: |
        # Grant all members of the group 'my-org:team-alpha; the ability to sync apps in 'my-project'
        #p, my-org:team-alpha, applications, sync, my-project/*, allow
        # Grant all members of 'my-org:team-beta' admins
        g, k-app-argocd-admin, role:admin

  repoServer:
    autoscaling:
      enabled: true
      minReplicas: 2

    volumeMounts:
    - mountPath: "/etc/ssl/certs"
      name: ca-bundle
    volumes:
    - name: ca-bundle
      secret:
        secretName: ca-bundle

  applicationSet:
    replicaCount: 2

Expected behavior

argocd stood up with ha configuration

Screenshots

image

Additional context

kubernetes v1.25.8
os redhat 9
k8s installed via kubeadm

I boosted the memory by 4 gb at a time up to 48 on each of 3 worker nodes, this is a newly setup cluster with this argocd deployment pretty much the only thing running. If I ssh into each of the worker nodes it shows haproxy using up all the cpu and memory.

Metadata

Metadata

Assignees

No one assigned

    Labels

    argo-cdbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions