Skip to content

Conversation

vipul-21
Copy link
Contributor

@vipul-21 vipul-21 commented Nov 2, 2023

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

Cilium pods on a single stack cluster were failing when ipv6 was enabled. The expectation is that ipv4 should be working even if there are error in ipv6.

Fixes: #28909

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 2, 2023
@vipul-21 vipul-21 changed the title fix: removing the fatal error for ipv6 cilium config fix: removing the fatal error for ipv6 cilium config on a single stack cluster Nov 2, 2023
@vipul-21 vipul-21 force-pushed the dualstackmaster branch 3 times, most recently from f30f597 to 718c26f Compare November 3, 2023 17:04
@vipul-21 vipul-21 marked this pull request as ready for review November 3, 2023 17:22
@vipul-21 vipul-21 requested review from a team as code owners November 3, 2023 17:22
@vipul-21 vipul-21 requested review from marseel and ti-mo November 3, 2023 17:22
@ti-mo ti-mo requested review from a team and jschwinger233 and removed request for a team November 6, 2023 13:28
@ti-mo ti-mo added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/misc This PR makes changes that have no direct user impact. area/agent Cilium agent related. labels Nov 6, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Nov 6, 2023
@ti-mo
Copy link
Contributor

ti-mo commented Nov 6, 2023

Code itself LGTM, but not sure if continuing silently is a good idea? There's a bunch of downstream logic that runs if option.Config.EnableIPv6 is true, so v6 support should probably be probed early on to set this flag to false if there's no v6 support on the host.

Also cc @cilium/sig-foundations since related to option.Config.

@ti-mo ti-mo requested a review from a team November 6, 2023 13:33
@vipul-21
Copy link
Contributor Author

vipul-21 commented Nov 6, 2023

@ti-mo So if we want to enable/disable ipv6 earlier would that be under NodeDiscovery or we should change the function IPv6Enabled() in daemonConfig here: https://github.com/cilium/cilium/blob/main/pkg/option/config.go#L2724 to check for host can support ipv6 or not.

Copy link
Member

@jschwinger233 jschwinger233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about "The expectation is that ipv4 should be working even if there are errors in getting ipv6 ips". Do you mean we expect cilium to work without IPv6 even if IPv6 is enabled by user when DirectRoutingDevice for v6 can't be found?

@vipul-21
Copy link
Contributor Author

vipul-21 commented Nov 7, 2023

@jschwinger233 What I meant is that even if we don't have a node that supports ipv6 and the user has enabled ipv6 on that node, the cilium pods should not go into crashloopback.

@vipul-21
Copy link
Contributor Author

vipul-21 commented Nov 7, 2023

Seeing the code, I see the usage of both option.Config.EnableIPv6 and option.Config.IPv6ENabled() , if we change the function option.Config.IPv6ENabled() to return true or false based on option.Config.EnableIPv6 and host/node supporting ipv6:

  1. this would require us to change the usage of option.Config.EnableIPv6 to option.Config.IPv6ENabled() everywhere ?
  2. How would we let the user know that even though the ipv6 is enabled, the cilium is not acknowledging that due to some issue as we are changing the option.Config.EnableIPv6 based on the flag and host properties ? Warning logs is one option.

In the current implementation, the node ip is fetched for each device or interface, and when we don't find the node ipv6 we continue.

  • Should we add a similar check earlier(if so where ?) that changes the EnableIPv6 to false or true based on the check ?
  • If the node does not support ipv6 I don't think we should not change the config value of this EnableIpv6 as other nodes might support v6 @ti-mo

@vipul-21
Copy link
Contributor Author

Moved the check in the init file and disabled ipv6 with a warning message that node does not have ipv6 support.

@vipul-21 vipul-21 changed the title fix: removing the fatal error for ipv6 cilium config on a single stack cluster fix: Adding the fatal error for ipv6 cilium config on a single stack node Nov 28, 2023
@vipul-21
Copy link
Contributor Author

Based on the conversation in the issue: #28909 (comment) the check is added to make sure that the uniform behavior is provided irrespective of the KPR value.

Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
I think the approach is coherent with #28909 (comment). I left a couple of suggestions inline.

@vipul-21 vipul-21 force-pushed the dualstackmaster branch 2 times, most recently from 3f4dae2 to 380ba0c Compare January 24, 2024 17:19
@vipul-21 vipul-21 requested a review from pippolo84 January 24, 2024 17:20
Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 🚀

@pippolo84
Copy link
Member

/test

Cilium pods on a single stack cluster were failing when ipv6 was enabled.
The change would make sure that if the underlying node does not have ipv6
configuration enabled it would fail even if KPR is disabled.Earlier, if KPR
is enabled, the node would not see a fatal error.

Signed-off-by: Vipul Singh <singhvipul@microsoft.com>
@vipul-21
Copy link
Contributor Author

@pippolo84 can we merge this ?

@pippolo84
Copy link
Member

/test

@pippolo84
Copy link
Member

E2E flake tracked here, rerunning.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Feb 1, 2024
@julianwiedmann julianwiedmann added the feature/ipv6 Relates to IPv6 protocol support label Feb 1, 2024
@julianwiedmann julianwiedmann added this pull request to the merge queue Feb 1, 2024
Merged via the queue into cilium:main with commit 2fc52eb Feb 1, 2024
@hexchen
Copy link

hexchen commented Sep 13, 2024

This PR introduced a regression in our environment. #34861 for a detailed description of the problem.

pippolo84 added a commit to pippolo84/cilium that referenced this pull request Jun 20, 2025
This reverts commit 2fc52eb.

The commit was added with the intent to have an uniform behavior when
Cilium is started on a single stack cluster with ipv6 support enabled,
despite the KPR setting being enabled or disabled. Specifically, the
added check was stopping the agent startup in case of a missing IPv6
address on the node, even when KPR was disabled.

However, the IPv6 detection for NodePort, responsible for the startup
error in case of missing IPv6 address with KPR enabled, has been removed
in cilium#29033. Therefore, there is no
reason anymore to stop the agent at startup when the IPv6 support is
enabled and the node is missing an IPv6 address.  Moreover, the commit
above led to some unintended consequences in environments where KPR is
disabled and the IPv6 address is set on the node after the Cilium agent
is started (see cilium#34861).

The same unintended consequence became more frequent after
cilium#32381, where we moved later in the
daemon startup process the first call to SetDefaultPrefix. That
function, among other things, was in charge of setting up a temporary
IPv6 Unique Local Address in case the node does not have one yet. Since
now the agent does this later (specifically, while configuring IPAM),
the check introduced in 2fc52eb sees the IPv6 node address as nil and
stops the agent.

Therefore, revert the check to avoid the unintended regressions.

Related: cilium#28953
Related: cilium#34861
Related: cilium#29033
Related: cilium#32381
Fixes: cilium#37817

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84 added a commit to pippolo84/cilium that referenced this pull request Jun 23, 2025
This reverts commit 2fc52eb.

The commit was added with the intent to have an uniform behavior when
Cilium is started on a single stack cluster with ipv6 support enabled,
despite the KPR setting being enabled or disabled. Specifically, the
added check was stopping the agent startup in case of a missing IPv6
address on the node, even when KPR was disabled.

However, the IPv6 detection for NodePort, responsible for the startup
error in case of missing IPv6 address with KPR enabled, has been removed
in cilium#29033. Therefore, there is no
reason anymore to stop the agent at startup when the IPv6 support is
enabled and the node is missing an IPv6 address.  Moreover, the commit
above led to some unintended consequences in environments where KPR is
disabled and the IPv6 address is set on the node after the Cilium agent
is started (see cilium#34861).

The same unintended consequence became more frequent after
cilium#32381, where we moved later in the
daemon startup process the first call to SetDefaultPrefix. That
function, among other things, was in charge of setting up a temporary
IPv6 Unique Local Address in case the node does not have one yet. Since
now the agent does this later (specifically, while configuring IPAM),
the check introduced in 2fc52eb sees the IPv6 node address as nil and
stops the agent.

Therefore, revert the check to avoid the unintended regressions.

Related: cilium#28953
Related: cilium#34861
Related: cilium#29033
Related: cilium#32381
Fixes: cilium#37817

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Jun 23, 2025
This reverts commit 2fc52eb.

The commit was added with the intent to have an uniform behavior when
Cilium is started on a single stack cluster with ipv6 support enabled,
despite the KPR setting being enabled or disabled. Specifically, the
added check was stopping the agent startup in case of a missing IPv6
address on the node, even when KPR was disabled.

However, the IPv6 detection for NodePort, responsible for the startup
error in case of missing IPv6 address with KPR enabled, has been removed
in #29033. Therefore, there is no
reason anymore to stop the agent at startup when the IPv6 support is
enabled and the node is missing an IPv6 address.  Moreover, the commit
above led to some unintended consequences in environments where KPR is
disabled and the IPv6 address is set on the node after the Cilium agent
is started (see #34861).

The same unintended consequence became more frequent after
#32381, where we moved later in the
daemon startup process the first call to SetDefaultPrefix. That
function, among other things, was in charge of setting up a temporary
IPv6 Unique Local Address in case the node does not have one yet. Since
now the agent does this later (specifically, while configuring IPAM),
the check introduced in 2fc52eb sees the IPv6 node address as nil and
stops the agent.

Therefore, revert the check to avoid the unintended regressions.

Related: #28953
Related: #34861
Related: #29033
Related: #32381
Fixes: #37817

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Jun 23, 2025
This reverts commit 2fc52eb.

The commit was added with the intent to have an uniform behavior when
Cilium is started on a single stack cluster with ipv6 support enabled,
despite the KPR setting being enabled or disabled. Specifically, the
added check was stopping the agent startup in case of a missing IPv6
address on the node, even when KPR was disabled.

However, the IPv6 detection for NodePort, responsible for the startup
error in case of missing IPv6 address with KPR enabled, has been removed
in #29033. Therefore, there is no
reason anymore to stop the agent at startup when the IPv6 support is
enabled and the node is missing an IPv6 address.  Moreover, the commit
above led to some unintended consequences in environments where KPR is
disabled and the IPv6 address is set on the node after the Cilium agent
is started (see #34861).

The same unintended consequence became more frequent after
#32381, where we moved later in the
daemon startup process the first call to SetDefaultPrefix. That
function, among other things, was in charge of setting up a temporary
IPv6 Unique Local Address in case the node does not have one yet. Since
now the agent does this later (specifically, while configuring IPAM),
the check introduced in 2fc52eb sees the IPv6 node address as nil and
stops the agent.

Therefore, revert the check to avoid the unintended regressions.

Related: #28953
Related: #34861
Related: #29033
Related: #32381
Fixes: #37817

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
viktor-kurchenko pushed a commit that referenced this pull request Jun 25, 2025
[ upstream commit dfd8f93 ]

This reverts commit 2fc52eb.

The commit was added with the intent to have an uniform behavior when
Cilium is started on a single stack cluster with ipv6 support enabled,
despite the KPR setting being enabled or disabled. Specifically, the
added check was stopping the agent startup in case of a missing IPv6
address on the node, even when KPR was disabled.

However, the IPv6 detection for NodePort, responsible for the startup
error in case of missing IPv6 address with KPR enabled, has been removed
in #29033. Therefore, there is no
reason anymore to stop the agent at startup when the IPv6 support is
enabled and the node is missing an IPv6 address.  Moreover, the commit
above led to some unintended consequences in environments where KPR is
disabled and the IPv6 address is set on the node after the Cilium agent
is started (see #34861).

The same unintended consequence became more frequent after
#32381, where we moved later in the
daemon startup process the first call to SetDefaultPrefix. That
function, among other things, was in charge of setting up a temporary
IPv6 Unique Local Address in case the node does not have one yet. Since
now the agent does this later (specifically, while configuring IPAM),
the check introduced in 2fc52eb sees the IPv6 node address as nil and
stops the agent.

Therefore, revert the check to avoid the unintended regressions.

Related: #28953
Related: #34861
Related: #29033
Related: #32381
Fixes: #37817

[ Backporter's notes:
  - resolved conflict in: daemon/k8s/init.go
]

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
wanglei4687 pushed a commit to wanglei4687/cilium that referenced this pull request Jun 26, 2025
This reverts commit 2fc52eb.

The commit was added with the intent to have an uniform behavior when
Cilium is started on a single stack cluster with ipv6 support enabled,
despite the KPR setting being enabled or disabled. Specifically, the
added check was stopping the agent startup in case of a missing IPv6
address on the node, even when KPR was disabled.

However, the IPv6 detection for NodePort, responsible for the startup
error in case of missing IPv6 address with KPR enabled, has been removed
in cilium#29033. Therefore, there is no
reason anymore to stop the agent at startup when the IPv6 support is
enabled and the node is missing an IPv6 address.  Moreover, the commit
above led to some unintended consequences in environments where KPR is
disabled and the IPv6 address is set on the node after the Cilium agent
is started (see cilium#34861).

The same unintended consequence became more frequent after
cilium#32381, where we moved later in the
daemon startup process the first call to SetDefaultPrefix. That
function, among other things, was in charge of setting up a temporary
IPv6 Unique Local Address in case the node does not have one yet. Since
now the agent does this later (specifically, while configuring IPAM),
the check introduced in 2fc52eb sees the IPv6 node address as nil and
stops the agent.

Therefore, revert the check to avoid the unintended regressions.

Related: cilium#28953
Related: cilium#34861
Related: cilium#29033
Related: cilium#32381
Fixes: cilium#37817

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
viktor-kurchenko pushed a commit that referenced this pull request Jun 26, 2025
[ upstream commit dfd8f93 ]

This reverts commit 2fc52eb.

The commit was added with the intent to have an uniform behavior when
Cilium is started on a single stack cluster with ipv6 support enabled,
despite the KPR setting being enabled or disabled. Specifically, the
added check was stopping the agent startup in case of a missing IPv6
address on the node, even when KPR was disabled.

However, the IPv6 detection for NodePort, responsible for the startup
error in case of missing IPv6 address with KPR enabled, has been removed
in #29033. Therefore, there is no
reason anymore to stop the agent at startup when the IPv6 support is
enabled and the node is missing an IPv6 address.  Moreover, the commit
above led to some unintended consequences in environments where KPR is
disabled and the IPv6 address is set on the node after the Cilium agent
is started (see #34861).

The same unintended consequence became more frequent after
#32381, where we moved later in the
daemon startup process the first call to SetDefaultPrefix. That
function, among other things, was in charge of setting up a temporary
IPv6 Unique Local Address in case the node does not have one yet. Since
now the agent does this later (specifically, while configuring IPAM),
the check introduced in 2fc52eb sees the IPv6 node address as nil and
stops the agent.

Therefore, revert the check to avoid the unintended regressions.

Related: #28953
Related: #34861
Related: #29033
Related: #32381
Fixes: #37817

[ Backporter's notes:
  - resolved conflict in: daemon/k8s/init.go
]

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Jun 26, 2025
[ upstream commit dfd8f93 ]

This reverts commit 2fc52eb.

The commit was added with the intent to have an uniform behavior when
Cilium is started on a single stack cluster with ipv6 support enabled,
despite the KPR setting being enabled or disabled. Specifically, the
added check was stopping the agent startup in case of a missing IPv6
address on the node, even when KPR was disabled.

However, the IPv6 detection for NodePort, responsible for the startup
error in case of missing IPv6 address with KPR enabled, has been removed
in #29033. Therefore, there is no
reason anymore to stop the agent at startup when the IPv6 support is
enabled and the node is missing an IPv6 address.  Moreover, the commit
above led to some unintended consequences in environments where KPR is
disabled and the IPv6 address is set on the node after the Cilium agent
is started (see #34861).

The same unintended consequence became more frequent after
#32381, where we moved later in the
daemon startup process the first call to SetDefaultPrefix. That
function, among other things, was in charge of setting up a temporary
IPv6 Unique Local Address in case the node does not have one yet. Since
now the agent does this later (specifically, while configuring IPAM),
the check introduced in 2fc52eb sees the IPv6 node address as nil and
stops the agent.

Therefore, revert the check to avoid the unintended regressions.

Related: #28953
Related: #34861
Related: #29033
Related: #32381
Fixes: #37817

[ Backporter's notes:
  - resolved conflict in: daemon/k8s/init.go
]

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/agent Cilium agent related. area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. feature/ipv6 Relates to IPv6 protocol support ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dualstack cilium config fatal error on single stack cluster
7 participants