Skip to content

Conversation

pippolo84
Copy link
Member

This reverts commit 2fc52eb.

The commit was added with the intent to have an uniform behavior when Cilium is started on a single stack cluster with ipv6 support enabled, despite the KPR setting being enabled or disabled. Specifically, the added check was stopping the agent startup in case of a missing IPv6 address on the node, even when KPR was disabled.

However, the IPv6 detection for NodePort, responsible for the startup error in case of missing IPv6 address with KPR enabled, has been removed in #29033. Therefore, there is no reason anymore to stop the agent at startup when the IPv6 support is enabled and the node is missing an IPv6 address. Moreover, the commit above led to some unintended consequences in environments where KPR is disabled and the IPv6 address is set on the node after the Cilium agent is started (see #34861).

The same unintended consequence became more frequent after #32381, where we moved later in the daemon startup process the first call to SetDefaultPrefix. That function, among other things, was in charge of setting up a temporary IPv6 Unique Local Address in case the node does not have one yet. Since now the agent does this later (specifically, while configuring IPAM), the check introduced in 2fc52eb sees the IPv6 node address as nil and stops the agent.

Therefore, revert the check to avoid the unintended regressions.

Related: #28953
Related: #34861
Related: #29033
Related: #32381
Fixes: #37817

Do not fail the agent startup in case IPv6 support is enabled and the node does not have an IPv6 address assigned yet

@pippolo84 pippolo84 requested a review from a team as a code owner June 20, 2025 13:53
@pippolo84 pippolo84 added the kind/bug This is a bug in the Cilium logic. label Jun 20, 2025
@pippolo84 pippolo84 requested a review from mhofstetter June 20, 2025 13:53
@pippolo84 pippolo84 added area/daemon Impacts operation of the Cilium daemon. release-note/bug This PR fixes an issue in a previous release of Cilium. feature/ipv6 Relates to IPv6 protocol support labels Jun 20, 2025
@pippolo84
Copy link
Member Author

cc @vipul-21

@pippolo84
Copy link
Member Author

/test

@pippolo84 pippolo84 added needs-backport/1.15 needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch and removed needs-backport/1.15 needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Jun 20, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 20, 2025
@joestringer
Copy link
Member

Curious what kind of behavior we're aiming to support in Cilium if Cilium is configured to enable IPv6 support but the underlying nodes don't have IPv6 addresses? Is the idea that we can provide pod to pod and pod to node connectivity via IPv6 but route traffic between the nodes and outside the cluster using IPv4?

If Cilium is setting up a ULA in this case then it seems like the daemon should hopefully at least handle this situation and not crash due to lack of IPv6 configuration, which is good... but I also wonder if it's meaningful for Cilium to succeed in such a configuration in the first place?

@joestringer
Copy link
Member

I see #34861 brings up some ordering issue where initially the IPv6 configuration is nonsensical, but if Cilium tolerates the bad configuration then cloud bootstrapping can complete (presumably using IPv4), then assign an IPv6 address which maybe Cilium will detect and reconcile?

@joestringer
Copy link
Member

I guess technically with this patch you could deploy Cilium using IPv4, no IPv6 address assigned to the node, then configure Cilium for single-stack IPv6-only, then Cilium will assign an IPv6 ULA and then maybe Cilium can provide overlay connectivity for an IPv6-only cluster using an IPv4 underlay? I guess that's plausible, even if it feels like the user would face other problems due to this configuration 🤔

@pippolo84
Copy link
Member Author

pippolo84 commented Jun 23, 2025

but I also wonder if it's meaningful for Cilium to succeed in such a configuration in the first place?

I see #34861 brings up some ordering issue where initially the IPv6 configuration is nonsensical, but if Cilium tolerates the bad configuration then cloud bootstrapping can complete (presumably using IPv4), then assign an IPv6 address which maybe Cilium will detect and reconcile?

My understanding is that #34861 made up a legit case to support starting up without an IPv6 even though ipv6 is enabled on the node. This was possible before the patch in 2fc52eb that ultimately seems a bit too aggressive in failing the daemon startup in case no IPv6 is configured.

I think we should probably follow up with a clear agreement about the expectations for this case, but in the meantime I still think the best course of action is to revert 2fc52eb, given that the code regarding the original error is changed and in the current context it seems to me it is not fulfilling its original intent.

Alternatively, what about logging a warning without returning an error?

@joestringer
Copy link
Member

I think that a well-worded warning or error message could communicate the risk clearly enough. Probably something along the lines "IPv6 is enabled, but Cilium cannot find the IPv6 address for this node. This may cause connectivity disruption for Endpoints that attempt to communicate using IPv6." ?

This reverts commit 2fc52eb.

The commit was added with the intent to have an uniform behavior when
Cilium is started on a single stack cluster with ipv6 support enabled,
despite the KPR setting being enabled or disabled. Specifically, the
added check was stopping the agent startup in case of a missing IPv6
address on the node, even when KPR was disabled.

However, the IPv6 detection for NodePort, responsible for the startup
error in case of missing IPv6 address with KPR enabled, has been removed
in cilium#29033. Therefore, there is no
reason anymore to stop the agent at startup when the IPv6 support is
enabled and the node is missing an IPv6 address.  Moreover, the commit
above led to some unintended consequences in environments where KPR is
disabled and the IPv6 address is set on the node after the Cilium agent
is started (see cilium#34861).

The same unintended consequence became more frequent after
cilium#32381, where we moved later in the
daemon startup process the first call to SetDefaultPrefix. That
function, among other things, was in charge of setting up a temporary
IPv6 Unique Local Address in case the node does not have one yet. Since
now the agent does this later (specifically, while configuring IPAM),
the check introduced in 2fc52eb sees the IPv6 node address as nil and
stops the agent.

Therefore, revert the check to avoid the unintended regressions.

Related: cilium#28953
Related: cilium#34861
Related: cilium#29033
Related: cilium#32381
Fixes: cilium#37817

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Log a warning when the node IPv6 address cannot be found at startup and
the IPv6 support is enabled. The agent is not stopped since an IPv6 can
still be assigne later to the node (see
cilium#34861 as an example of such
scenario)

Suggested-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
@pippolo84 pippolo84 force-pushed the pr/pippolo84/remove-node-ipv6-startup-check branch from 8712049 to fe6755b Compare June 23, 2025 19:54
@pippolo84
Copy link
Member Author

/test

@joestringer joestringer enabled auto-merge June 23, 2025 20:53
@joestringer joestringer added this pull request to the merge queue Jun 23, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 23, 2025
@joestringer joestringer added this pull request to the merge queue Jun 23, 2025
Merged via the queue into cilium:main with commit e7486ec Jun 23, 2025
68 checks passed
@viktor-kurchenko viktor-kurchenko added backport-pending/1.17 The backport for Cilium 1.17.x for this PR is in progress. and removed needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch labels Jun 25, 2025
@github-actions github-actions bot added backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. and removed backport-pending/1.17 The backport for Cilium 1.17.x for this PR is in progress. labels Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. feature/ipv6 Relates to IPv6 protocol support kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

After upgrading to v1.17.1 from v1.16, Failed to start with No IPv6 support on node as ipv6 address is nil
5 participants