[ubuntu22.04][amazon-vpc-cni] ip rules flushed when systemd-networkd restarted

/kind feature

On 09th of June it was released, a new version of [systemd 249.11-0ubuntu3.16](https://launchpad.net/ubuntu/+source/systemd/249.11-0ubuntu3.16). During the Ubuntu unattended-upgrades, this package was upgraded on all Kubernetes nodes, which triggered the `systemd-networkd` service to restart as well. Hence, we started to have hundreds of Pods in `CrashLoopbackOff`.

After investigating, this proved to be the explanation. The default behaviour of `systemd-networkd` is to flush `ip rules` that are not managed by it. In this case, all per-pod `aws-vpc-cni`-created ip rules were removed when `systemd-networkd` restarted, leaving ALL running pods without routing in place. We started to see most of them in `CrashLoopbackOff`, the ingress controllers affected, so basically a full downtime. To recover, we had to `kubectl rollout restart`, which forces the Pods replacement, including the `aws-vpc-cni` ip rules configs to be recreated.

This default behaviour can be modified, example the [AmazonLinux2023](https://github.com/kubernetes/kops/blob/master/nodeup/pkg/model/networking/amazon-vpc-routed-eni.go#L55-L58) distribution where the following config is applied:

```
# Do not clobber any routes or rules added by CNI.
[Network]
ManageForeignRoutes=no
ManageForeignRoutingPolicyRules=no
```

The above ensures that `systemd-networkd` won't affect ip rules created by other third party network management tools, in this case `aws-vpc-cni`.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ubuntu22.04][amazon-vpc-cni] ip rules flushed when systemd-networkd restarted #17433

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ubuntu22.04][amazon-vpc-cni] ip rules flushed when systemd-networkd restarted #17433

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions