Skip to content

[BUG] The cni pod network cache cleanup does not work in all cases #8150

@w13915984028

Description

@w13915984028

Describe the Bug

The file /system/oem/99_cni_reset.yaml targets to cleanup the cached files.

https://github.com/harvester/harvester-installer/blob/4a28b0eef7ac970bb85f3ee29cc9e4e858be65fa/package/harvester-os/files/system/oem/99_cni_reset.yaml#L5

But on v1.5.0 cluster, it seemed not work all the time

harv41:~ # date  (cluster newly rebooted)
Mon Apr 28 10:45:22 UTC 2025

harv41:~ # ls /var/lib/cni/networks/k8s-pod-network/ -alth
total 344K
drwxr-xr-x 2 root root 4.0K Apr 28 10:36 .
-rw------- 1 root root   70 Apr 28 10:36 10.52.0.160
-rw------- 1 root root   11 Apr 28 10:36 last_reserved_ip.0
-rw------- 1 root root   70 Apr 28 10:36 10.52.0.159
...
-rw------- 1 root root   70 Apr 28 10:35 10.52.0.100
-rw------- 1 root root   70 Apr 25 09:22 10.52.0.98
-rw------- 1 root root   70 Apr 25 09:22 10.52.0.97
-rw------- 1 root root   70 Apr 23 12:14 10.52.0.37
-rw------- 1 root root   70 Apr 23 12:14 10.52.0.36
-rw------- 1 root root   70 Apr 17 06:30 10.52.0.151
-rw------- 1 root root   70 Apr 17 06:30 10.52.0.150
-rw------- 1 root root   70 Apr 16 09:00 10.52.0.75
-rw------- 1 root root   70 Apr 16 09:00 10.52.0.74
-rw------- 1 root root   70 Apr 14 12:44 10.52.0.14
-rw------- 1 root root   70 Apr 14 12:41 10.52.0.10
-rw------- 1 root root   70 Apr 14 12:40 10.52.0.2
-rw------- 1 root root   70 Apr 14 12:40 10.52.0.250
...
-rw------- 1 root root   70 Apr 11 13:23 10.52.0.207
-rw------- 1 root root   70 Apr  9 19:14 10.52.0.138
-rw------- 1 root root   70 Apr  9 19:01 10.52.0.60
drwxr-xr-x 3 root root 4.0K Apr  9 18:57 ..
-rwxr-x--- 1 root root    0 Apr  9 18:57 lock

To Reproduce

  1. Cold start a v150 cluster

  2. Check files ls /var/lib/cni/networks/k8s-pod-network/ -alth

  3. Per https://rancher.github.io/elemental-toolkit/docs/customizing/stages/#initramfs, when following file is used, the cached files are cleand.

harv41:~ # cat /oem/99_cni_reset.yaml 
name: "reset container dhcp leases"
stages:
   initramfs:
     - name: "clean network"
       commands:
       - rm -rf /var/lib/cni/networks/k8s-pod-network

harv41:~ # ls /var/lib/cni/networks/k8s-pod-network
ls: cannot access '/var/lib/cni/networks/k8s-pod-network': No such file or directory
harv41:~ # ls /var/lib/cni/networks/
harv41:~ # 

after some time when the node has deployed all workloads:

harv41:~ # ls /var/lib/cni/networks/k8s-pod-network/
10.52.0.10  10.52.0.15	10.52.0.2   10.52.0.24	10.52.0.29  10.52.0.33	10.52.0.38  10.52.0.42	10.52.0.47  10.52.0.52	10.52.0.57  last_reserved_ip.0
10.52.0.11  10.52.0.16	10.52.0.20  10.52.0.25	10.52.0.3   10.52.0.34	10.52.0.39  10.52.0.43	10.52.0.48  10.52.0.53	10.52.0.6   lock
10.52.0.12  10.52.0.17	10.52.0.21  10.52.0.26	10.52.0.30  10.52.0.35	10.52.0.4   10.52.0.44	10.52.0.5   10.52.0.54	10.52.0.7
10.52.0.13  10.52.0.18	10.52.0.22  10.52.0.27	10.52.0.31  10.52.0.36	10.52.0.40  10.52.0.45	10.52.0.50  10.52.0.55	10.52.0.8
10.52.0.14  10.52.0.19	10.52.0.23  10.52.0.28	10.52.0.32  10.52.0.37	10.52.0.41  10.52.0.46	10.52.0.51  10.52.0.56	10.52.0.9


Expected Behavior

Cached files are cleaned up.

Support Bundle for Troubleshooting

not required

Environment

  • Harvester version: v1.5.0
  • Impacted VM:
  • Impacted volume (PV):
  • Underlying Infrastructure (e.g., Baremetal with Dell PowerEdge R630): KVM VM based
  • Rancher version: v2.11.0

Additional context

Original issue #7471 has been verified.

Maybe elemental has some changes.

Workaround and Mitigation

No response

Metadata

Metadata

Labels

kind/bugIssues that are defects reported by users or that we know have reached a real releasereproduce/oftenReproducible 10% to 99% of the timerequire/reproduceRequire adding a `reproduce` label.require/severityRequire adding a `severity` label.severity/3Function working but has a major or UI issue w/ workaround

Type

No type

Projects

Status

Closed

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions