-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Description
Checklist:
- I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
- I've included steps to reproduce the bug.
- I've pasted the output of
argocd version
.
Describe the bug
Running ArgoCD HA mode on Rocky Linux 9 with kernel 5.14, haproxy pod keeps OOM crashing while works fine on CentOS 7 with kernel 5.10 :
[Sat Feb 4 21:35:25 2023] haproxy invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=998
[Sat Feb 4 21:35:25 2023] CPU: 0 PID: 1723369 Comm: haproxy Kdump: loaded Tainted: G X --------- --- 5.14.0-162.6.1.el9_1.0.1.x86_64 #1
[Sat Feb 4 21:35:25 2023] Hardware name: Dell Inc. PowerEdge R340/045M96, BIOS 2.2.3 09/27/2019
[Sat Feb 4 21:35:25 2023] Call Trace:
[Sat Feb 4 21:35:25 2023] dump_stack_lvl+0x34/0x48
[Sat Feb 4 21:35:25 2023] dump_header+0x4a/0x201
[Sat Feb 4 21:35:25 2023] oom_kill_process.cold+0xb/0x10
[Sat Feb 4 21:35:25 2023] out_of_memory.part.0+0xbf/0x270
[Sat Feb 4 21:35:25 2023] out_of_memory+0x3d/0x80
[Sat Feb 4 21:35:25 2023] mem_cgroup_out_of_memory+0x13a/0x150
[Sat Feb 4 21:35:25 2023] try_charge_memcg+0x73d/0x7a0
[Sat Feb 4 21:35:25 2023] ? __alloc_pages+0xe6/0x230
[Sat Feb 4 21:35:25 2023] charge_memcg+0x32/0xa0
[Sat Feb 4 21:35:25 2023] __mem_cgroup_charge+0x29/0x80
[Sat Feb 4 21:35:25 2023] do_anonymous_page+0xf1/0x580
[Sat Feb 4 21:35:25 2023] __handle_mm_fault+0x3cb/0x750
[Sat Feb 4 21:35:25 2023] handle_mm_fault+0xc5/0x2a0
[Sat Feb 4 21:35:25 2023] do_user_addr_fault+0x1bb/0x690
[Sat Feb 4 21:35:25 2023] exc_page_fault+0x62/0x150
[Sat Feb 4 21:35:25 2023] asm_exc_page_fault+0x22/0x30
[Sat Feb 4 21:35:25 2023] RIP: 0033:0x5579781695f0
[Sat Feb 4 21:35:25 2023] Code: 48 c1 e0 06 48 01 c8 c7 40 04 ff ff ff ff c7 00 ff ff ff ff 83 fa 10 75 e1 85 ed 7e 1d 89 ed 48 c1 e5 06 48 8d 44 1d 00 66 90 <c7> 43 18 fd ff ff ff 48 83 c3 40 48 39 c3 75 f0 48 8d 2d 61 2a 10
[Sat Feb 4 21:35:25 2023] RSP: 002b:00007ffd4e3d1600 EFLAGS: 00010287
[Sat Feb 4 21:35:25 2023] RAX: 00007f8247fffe40 RBX: 00007f72c7589000 RCX: 00005579784bea00
[Sat Feb 4 21:35:25 2023] RDX: 0000000000000010 RSI: 00000003ffffff80 RDI: 00007f6a48000010
[Sat Feb 4 21:35:25 2023] RBP: 0000000ffffffe00 R08: 00007f6a48000010 R09: 0000000000000000
[Sat Feb 4 21:35:25 2023] R10: 0000000000000022 R11: 0000000000000246 R12: 000000003ffffff8
[Sat Feb 4 21:35:25 2023] R13: 0000000000000000 R14: 00007ffd4e3d1700 R15: 0000557978269250
[Sat Feb 4 21:35:25 2023] memory: usage 2097152kB, limit 2097152kB, failcnt 634
[Sat Feb 4 21:35:25 2023] swap: usage 0kB, limit 0kB, failcnt 0
[Sat Feb 4 21:35:25 2023] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod96e04706_6c90_4b53_a2bf_1c336eaf428d.slice/cri-containerd-fba1f21f94b32ec9c899dfa26fea10601e36b48bda6337a9c6ef0e5f22e5cd53.scope:
[Sat Feb 4 21:35:25 2023] anon 2142924800
file 4096
kernel 4554752
kernel_stack 16384
pagetables 4272128
percpu 576
sock 0
vmalloc 24576
shmem 4096
file_mapped 4096
file_dirty 0
file_writeback 0
swapcached 0
anon_thp 2132803584
file_thp 0
shmem_thp 0
inactive_anon 2142916608
active_anon 8192
inactive_file 0
active_file 0
unevictable 0
slab_reclaimable 127240
slab_unreclaimable 78792
slab 206032
workingset_refault_anon 0
workingset_refault_file 1522
workingset_activate_anon 0
workingset_activate_file 36
workingset_restore_anon 0
workingset_restore_file 0
workingset_nodereclaim 0
pgfault 4506
pgmajfault 43
pgrefill 727
pgscan 2647
pgsteal 1523
pgactivate 691
pgdeactivate 727
pglazyfree 0
pglazyfreed 0
thp_fault_alloc 1020
thp_collapse_alloc 0
[Sat Feb 4 21:35:25 2023] Tasks state (memory values in pages):
[Sat Feb 4 21:35:25 2023] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[Sat Feb 4 21:35:25 2023] [1723369] 99 1723369 41965712 523755 4284416 0 998 haproxy
[Sat Feb 4 21:35:25 2023] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-fba1f21f94b32ec9c899dfa26fea10601e36b48bda6337a9c6ef0e5f22e5cd53.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod96e04706_6c90_4b53_a2bf_1c336eaf428d.slice/cri-containerd-fba1f21f94b32ec9c899dfa26fea10601e36b48bda6337a9c6ef0e5f22e5cd53.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod96e04706_6c90_4b53_a2bf_1c336eaf428d.slice/cri-containerd-fba1f21f94b32ec9c899dfa26fea10601e36b48bda6337a9c6ef0e5f22e5cd53.scope,task=haproxy,pid=1723369,uid=99
[Sat Feb 4 21:35:25 2023] Memory cgroup out of memory: Killed process 1723369 (haproxy) total-vm:167862848kB, anon-rss:2092452kB, file-rss:2564kB, shmem-rss:4kB, UID:99 pgtables:4184kB oom_score_adj:998
This is a fairly small set up inside a dev environment with around 10 Applications. All the ArgoCD related pods are running on three bare metal worker nodes inside a Kubernetes clusters. Two worker nodes have CentOS 7 installed with kernel 5.10 while one worker nodes has Rocky Linux 9 installed with kernel 5.14. Two of the haproxy pods running on CentOS 7 nodes using memory less than 100MB but that one running on Rocky Linux 9 could use all the memory on the node without limits and easily reaches its memory limits like the one above with 2GB memory limits configured.
I also tried to bump its haproxy version to latest stable version 2.7.2 which didn't help and the pod still keeps being OOMKilled.
Containers:
haproxy:
Container ID: containerd://fba1f21f94b32ec9c899dfa26fea10601e36b48bda6337a9c6ef0e5f22e5cd53
Image: haproxy:2.7.2
Image ID: docker.io/library/haproxy@sha256:4f79e6112b2a2fba850e842a6c457bc80a2064ad573bfafafd1ed2df64caab30
Ports: 6379/TCP, 9101/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Sat, 04 Feb 2023 21:35:24 +1100
Finished: Sat, 04 Feb 2023 21:35:25 +1100
Ready: False
Restart Count: 5
Limits:
cpu: 2
memory: 2Gi
Requests:
cpu: 100m
memory: 128Mi
Liveness: http-get http://:8888/healthz delay=500s timeout=100s period=3s #success=1 #failure=3
Readiness: http-get http://:8888/healthz delay=5s timeout=1s period=3s #success=1 #failure=3
Environment: <none>
Mounts:
/run/haproxy from shared-socket (rw)
/usr/local/etc/haproxy from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6hzdj (ro)
To Reproduce
- Deploy ArgoCD on Rocky Linux 9 (similar to Red Hat Enterprise Linux 9) with Kubernetes 1.24.
Expected behavior
Screenshots
Version
❯ argocd version
argocd: v2.5.7+e0ee345.dirty
BuildDate: 2023-01-18T04:38:11Z
GitCommit: e0ee3458d0921ad636c5977d96873d18590ecf1a
GitTreeState: dirty
GoVersion: go1.19.5
Compiler: gc
Platform: darwin/amd64
argocd-server: v2.5.10+d311fad
Rocky Linux 9 kernel version: 5.14.0-162.6.1.el9_1.0.1.x86_64
Kubernetes version: 1.24.10-0
Containerd version: 1.6.16-3.1.el9
Logs
Paste any relevant application logs here.