-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Is there an existing issue for this?
- I have searched the existing issues
What happened?
Temporary connectivity disruption can occur on agent restart when Cilium is configured in native routing mode, and WireGuard encryption is enabled, because the list of Allowed IPs gets recreated from scratch upon reception of the node event for each given remote node, possibly removing entries for valid endpoints that have not yet been discovered at that point through the CiliumEndpoint CRD or the corresponding kvstore representation. This issue, instead, does not affect the current implementation in tunnel mode, as in that case we encrypt encapsulated traffic, which always has source and destination addresses corresponding to Node Internal IPs, which are immediately added as Allowed IPs.
A possible solution would be to restore the list of Allowed IPs for each peer from the WireGuard state after agent restart, and then do a GC pass to remove the stale entries after that ipcache synchronization has completed. IPCache synchronization should account for CiliumEndpoint synchronization (if the CiliumEndpoint CRD is enabled), kvstore synchronization (if kvstore mode is enabled), and clustermesh synchronization (when clustermesh is enabled).
Cilium Version
Tested on tip of main, but likely all versions are affected
Code of Conduct
- I agree to follow this project's Code of Conduct