Skip to content

[ubuntu22.04][amazon-vpc-cni] ip rules flushed when systemd-networkd restarted #17433

@ataut-pai

Description

@ataut-pai

/kind feature

On 09th of June it was released, a new version of systemd 249.11-0ubuntu3.16. During the Ubuntu unattended-upgrades, this package was upgraded on all Kubernetes nodes, which triggered the systemd-networkd service to restart as well. Hence, we started to have hundreds of Pods in CrashLoopbackOff.

After investigating, this proved to be the explanation. The default behaviour of systemd-networkd is to flush ip rules that are not managed by it. In this case, all per-pod aws-vpc-cni-created ip rules were removed when systemd-networkd restarted, leaving ALL running pods without routing in place. We started to see most of them in CrashLoopbackOff, the ingress controllers affected, so basically a full downtime. To recover, we had to kubectl rollout restart, which forces the Pods replacement, including the aws-vpc-cni ip rules configs to be recreated.

This default behaviour can be modified, example the AmazonLinux2023 distribution where the following config is applied:

# Do not clobber any routes or rules added by CNI.
[Network]
ManageForeignRoutes=no
ManageForeignRoutingPolicyRules=no

The above ensures that systemd-networkd won't affect ip rules created by other third party network management tools, in this case aws-vpc-cni.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions