Skip to content

Conversation

nathanjsweet
Copy link
Member

@nathanjsweet nathanjsweet commented Dec 8, 2022

Previously if either the "0.0.0.0/0" CIDR (hereafter "world-ipv4")
or a world-ipv6 (i.e. "::/0") CIDR was present the selector creation logic
for CIDRs would also add the "reserved:world" entity to the
selector labels generated by the CIDR for the sake of the policy engine.

This does not work when Cilium is dual-stack mode as the "reserved:world"
entity covers both IPv4 and IPv6, so both CIDRs should be present before such
an addition to the policy engine logic is made. Moreover, dual-stack mode lacks
any mechanism to distinguish world-ipv4 and world-ipv6 from one another.
This bug has prevented the possibility of managing world-ipv4 and world-ipv6
separately from one another in network policies.

Separate identities (and labels, etc) for world-ipv4 and world-ipv6 now exist
so they can be distinguished.

Signed-off-by: Nate Sweet nathanjsweet@pm.me

Fixes: #22610

bug: In dual-stack mode (both IPv4 and IPv6 are enabled), Cilium incorrectly converted CIDRs that covered all possible addresses for an IP Family (e.g. 0.0.0.0/0) to the "reserved:world" entity. Both IP families must be completely covered for "reserved:world" to apply. This resulted in dual-stack mode network policies that could not distinguish between world IPv4 and IPv6 traffic, treating them as one entity instead.

@nathanjsweet nathanjsweet added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Dec 8, 2022
@nathanjsweet nathanjsweet requested a review from a team as a code owner December 8, 2022 18:27
@nathanjsweet nathanjsweet requested a review from aanm December 8, 2022 18:27
Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @nathanjsweet just a heads-up that since we have branched v1.13, this PR will not be part of the 1.13 unless it is set with the needs-backport/1.13 label.

@nathanjsweet
Copy link
Member Author

/test

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! LGTM.

@joestringer
Copy link
Member

Smoke tests are failing with agents reporting Error: unknown command "build-config" for "cilium", probably this just means the PR is not based against a recent master version?

Other failures are non-required builds ConformanceGKE and Cilium Datapath. @nathanjsweet can you rebase this PR so we can see that the smoke tests can pass?

@joestringer joestringer force-pushed the pr/nathanjsweet/fix-world-identity-bug branch from 18fa7b3 to eca53f7 Compare December 14, 2022 02:18
@joestringer
Copy link
Member

Ooh, I found a magic "Update with rebase" button on GitHub, so I clicked it. Let's see.

@joestringer
Copy link
Member

/test

@joestringer
Copy link
Member

@nathanjsweet it looks like the Cilium Datapath tests are failing consistently on a CIDR deny policy issue. Do you think that this may be related to the PR content? I don't see any similar failures in other recent failures for the workflow.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 14, 2022
@pchaigno pchaigno removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 14, 2022
@pchaigno
Copy link
Member

pchaigno commented Dec 14, 2022

@nathanjsweet it looks like the Cilium Datapath tests are failing consistently on a CIDR deny policy issue. Do you think that this may be related to the PR content? I don't see any similar failures in other recent failures for the workflow.

I checked one of the failures and it does include a policy with the 0.0.0.0/0 CIDR. Since all tests are failing with the exact same test cases, I assume they all have the same policy. Seems fairly likely to be related to the changes here.

@pchaigno pchaigno added the dont-merge/bad-bot To prevent MLH from marking ready-to-merge. label Dec 14, 2022
@nathanjsweet nathanjsweet force-pushed the pr/nathanjsweet/fix-world-identity-bug branch from eca53f7 to c600cc9 Compare December 14, 2022 21:15
@joestringer
Copy link
Member

@nathanjsweet were you able to reproduce / track down the datapath failure?

@nathanjsweet
Copy link
Member Author

@joestringer no, not yet. I was waiting to see if this fixed it.

@nathanjsweet
Copy link
Member Author

nathanjsweet commented Dec 14, 2022

/test

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sDatapathCustomCalls Basic test with byte-counter Loads byte-counter and gets consistent values, with per-endpoint routes

Failure Output

FAIL: Byte count (980) differs from expected value (490)

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking on above conversation with Joe and CI failure. I suspect the two are related 🙂

@nathanjsweet
Copy link
Member Author

/test

@github-actions
Copy link

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Feb 12, 2023
@nathanjsweet
Copy link
Member Author

I meant to update this PR, but never did. This PR is also blocked on #22966 as it significantly alters the policy engine code. Making this change should happen after that PR is merged.

@nathanjsweet
Copy link
Member Author

/test

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from proxy side

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest changes for encap look good, and there's one decap case handled, then I think there's a couple more remaining decap cases in bpf/bpf_overlay.c:handle_ipv4() and bpf/bpf_overlay.c:handle_ipv6() , both for the high scale and regular tunnel decap cases without encryption.

These suggest to me that we don't have policy coverage for an IPv4-only 0.0.0.0/0 allow for pods where the traffic is received from outside the cluster through a nodeport on a different node. That sort of test should be able to highlight a policy deny issue with this latest implementation.

@nathanjsweet nathanjsweet force-pushed the pr/nathanjsweet/fix-world-identity-bug branch from 12bb6d6 to c5928d5 Compare August 17, 2023 16:29
@nathanjsweet nathanjsweet force-pushed the pr/nathanjsweet/fix-world-identity-bug branch from c5928d5 to 9aaca8b Compare August 17, 2023 16:32
@nathanjsweet
Copy link
Member Author

/test

Previously if either the "0.0.0.0/0" CIDR (hereafter "world-ipv4")
or a world-ipv6 CIDR was present the selector creation logic
for CIDRs would also add the "reserved:world" entity to the
selector labels generated by the CIDR.

This does not work when Cilium is dual-stack mode as "reserved:world"
covers both IPv4 and IPv6, so both CIDRs should be present before such
an addition is made. Moreover, dual-stack mode lacked any mechanism to
distinguish world-ipv4 and world-ipv6 from another. This bug has prevented
the possibility of managing world-ipv4 and world-ipv6 separately from one
another in the policy engine.

This inccorect assumption goes all the way to the datapath, where SECLABEL
and WORLD_ID were part of drop notification, policy logic, and tunnel logic.
SECLABLE and WORLD_ID were both set by userspace compilation logic.

Separate identities (and labels, etc) for world-ipv4 and world-ipv6 now exist
so they can be distinguished in userspace and in the datapath.

Some utility methods have been added in userspace and datapath to check
for the necessity of using world-ipv4 and world-ipv6 identities.

Explicit tests for dual-stack mode have been added.

The generic WORLD_ID is still used as the VNI tag for all tunnel logic
in the datapath. When tunnel code is encapsulating any WORLD_IPX_ID
identity it will convert the VNI to WORLD_ID. When tunnel code is
decapsulating any WORLD_ID VNI it will convert the VNI to either
WORLD_IPV4_ID or WORLD_IPV6_ID based on the underlying packet
protocol.

Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
@nathanjsweet nathanjsweet force-pushed the pr/nathanjsweet/fix-world-identity-bug branch from 9aaca8b to 21274a7 Compare August 17, 2023 17:30
@nathanjsweet
Copy link
Member Author

/test

@nathanjsweet nathanjsweet added dont-merge/waiting-for-user-feedback Waiting for feedback from user before the PR should be merged. and removed dont-merge/waiting-for-user-feedback Waiting for feedback from user before the PR should be merged. labels Aug 17, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Aug 17, 2023
@nathanjsweet nathanjsweet merged commit a94fa56 into main Aug 18, 2023
@nathanjsweet nathanjsweet deleted the pr/nathanjsweet/fix-world-identity-bug branch August 18, 2023 01:48
squeed added a commit to squeed/cilium that referenced this pull request Dec 18, 2023
Previously, we had a single label, `reserved:world`, applied to all CIDR
(non-cluster) identities. With cilium#22625, this is no longer the case; CIDR
identities get either `reserved:world-ipv4` or `reserved:world-ipv6`
labels **in dual stack clusters**.

This PR updates the `toEntities: world` selector to select *all* world
entities, as it worked previously. It does so by expanding the set of
selectors underlying the `world` entity. No other magic in the
SelectorCache or policy engine is required.

Fixes: cilium#29666
Fixes: a94fa56

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
gandro pushed a commit that referenced this pull request Dec 18, 2023
Previously, we had a single label, `reserved:world`, applied to all CIDR
(non-cluster) identities. With #22625, this is no longer the case; CIDR
identities get either `reserved:world-ipv4` or `reserved:world-ipv6`
labels **in dual stack clusters**.

This PR updates the `toEntities: world` selector to select *all* world
entities, as it worked previously. It does so by expanding the set of
selectors underlying the `world` entity. No other magic in the
SelectorCache or policy engine is required.

Fixes: #29666
Fixes: a94fa56

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
squeed added a commit to squeed/cilium that referenced this pull request Dec 18, 2023
[ upstream commit 1288473 ]

Previously, we had a single label, `reserved:world`, applied to all CIDR
(non-cluster) identities. With cilium#22625, this is no longer the case; CIDR
identities get either `reserved:world-ipv4` or `reserved:world-ipv6`
labels **in dual stack clusters**.

This PR updates the `toEntities: world` selector to select *all* world
entities, as it worked previously. It does so by expanding the set of
selectors underlying the `world` entity. No other magic in the
SelectorCache or policy engine is required.

Fixes: cilium#29666
Fixes: a94fa56

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
julianwiedmann pushed a commit that referenced this pull request Dec 18, 2023
[ upstream commit 1288473 ]

Previously, we had a single label, `reserved:world`, applied to all CIDR
(non-cluster) identities. With #22625, this is no longer the case; CIDR
identities get either `reserved:world-ipv4` or `reserved:world-ipv6`
labels **in dual stack clusters**.

This PR updates the `toEntities: world` selector to select *all* world
entities, as it worked previously. It does so by expanding the set of
selectors underlying the `world` entity. No other magic in the
SelectorCache or policy engine is required.

Fixes: #29666
Fixes: a94fa56

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Comment on lines -348 to +352
if (*identity != WORLD_ID)
if (identity_is_world_ipv4(*identity))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it reversed the logic, breaking VTEP support, see #31023.

sayboras pushed a commit that referenced this pull request Jun 10, 2024
[ upstream commit 1288473 ]

Previously, we had a single label, `reserved:world`, applied to all CIDR
(non-cluster) identities. With #22625, this is no longer the case; CIDR
identities get either `reserved:world-ipv4` or `reserved:world-ipv6`
labels **in dual stack clusters**.

This PR updates the `toEntities: world` selector to select *all* world
entities, as it worked previously. It does so by expanding the set of
selectors underlying the `world` entity. No other magic in the
SelectorCache or policy engine is required.

Fixes: #29666
Fixes: a94fa56

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. upgrade-impact This PR has potential upgrade or downgrade impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NetworkPolicy and CiliumNetworkPolicy allow IPv6 egress if 0.0.0.0/0 is allowed