-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Description
/kind feature
On 09th of June it was released, a new version of systemd 249.11-0ubuntu3.16. During the Ubuntu unattended-upgrades, this package was upgraded on all Kubernetes nodes, which triggered the systemd-networkd
service to restart as well. Hence, we started to have hundreds of Pods in CrashLoopbackOff
.
After investigating, this proved to be the explanation. The default behaviour of systemd-networkd
is to flush ip rules
that are not managed by it. In this case, all per-pod aws-vpc-cni
-created ip rules were removed when systemd-networkd
restarted, leaving ALL running pods without routing in place. We started to see most of them in CrashLoopbackOff
, the ingress controllers affected, so basically a full downtime. To recover, we had to kubectl rollout restart
, which forces the Pods replacement, including the aws-vpc-cni
ip rules configs to be recreated.
This default behaviour can be modified, example the AmazonLinux2023 distribution where the following config is applied:
# Do not clobber any routes or rules added by CNI.
[Network]
ManageForeignRoutes=no
ManageForeignRoutingPolicyRules=no
The above ensures that systemd-networkd
won't affect ip rules created by other third party network management tools, in this case aws-vpc-cni
.