Skip to content

Possible connectivity disruption on agent restart with WireGuard + native routing #31979

@giorio94

Description

@giorio94

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Temporary connectivity disruption can occur on agent restart when Cilium is configured in native routing mode, and WireGuard encryption is enabled, because the list of Allowed IPs gets recreated from scratch upon reception of the node event for each given remote node, possibly removing entries for valid endpoints that have not yet been discovered at that point through the CiliumEndpoint CRD or the corresponding kvstore representation. This issue, instead, does not affect the current implementation in tunnel mode, as in that case we encrypt encapsulated traffic, which always has source and destination addresses corresponding to Node Internal IPs, which are immediately added as Allowed IPs.

A possible solution would be to restore the list of Allowed IPs for each peer from the WireGuard state after agent restart, and then do a GC pass to remove the stale entries after that ipcache synchronization has completed. IPCache synchronization should account for CiliumEndpoint synchronization (if the CiliumEndpoint CRD is enabled), kvstore synchronization (if kvstore mode is enabled), and clustermesh synchronization (when clustermesh is enabled).

Cilium Version

Tested on tip of main, but likely all versions are affected

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

area/agentCilium agent related.area/encryptionImpacts encryption support such as IPSec, WireGuard, or kTLS.feature/wireguardRelates to Cilium's Wireguard featurekind/bugThis is a bug in the Cilium logic.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions