Skip to content

Conversation

jrajahalme
Copy link
Member

@jrajahalme jrajahalme commented Jul 10, 2024

Deny entries covered by another deny CIDR entry must be kept as they have different numerical identities in the datapath. Otherwise a later added allow-all entry could take effect if there is no specific match for the left-off identity. Simplest example of this is with L3-only deny rules and an Allow-all rule that wildcards also the L3. So for the sake if this example we only need to consider L3-identities of the CIDRs (10/8, and 10.1.1.1/32, first covering the latter), say X and Y (both nonzero) and an allow rule with ID 0 (ANY). This policy should allow all traffic that is not with an IP covered by 10/8. Since 10.1.1.1/32 is covered by 10/8 it would be tempting to not insert the rule for the ID assigned to CIDR 10.1.1.1/32 to the datapath policy map. The identities for both CIDRs are, however, present in the ipcache, so that traffic to (or from) 10.1.1.1 would be associated with ID Y, which would match the wildcard ID (0) allow rule, if a deny rule with the ID Y is not present in the datapath. Hence, we must both deny rules in to the policy map.

The above logic does not apply when the two rules in question have the same ID (but different coverage on port and protocol), or when one of them is a wildcard L3 rule (with ID 0). In that case we can leave the more deny rule covered by another off the policy map, but only if it was not needed to cover for a L4-only allow rule that still needs to take effect on all L3 identities not explicitly denied. In that case we add new L3/L4 deny entries to carve holes into the broader allow rule, and we must keep the deny rules we added for those holes even when they would otherwise be covered by a broader deny rule.

Note that a presence of a wider allow rule is not required at the time of policy map entry insertion for the above consideration to take effect. In this line of thought we could simply leave off all the deny entries if no allow entries exist, since the datapath is deny by default. This breaks down as soon as an allow rule is inserted to a now empty policy map (maybe incrementally due to an FQDN policy rule), as all the (non-inserted) deny rules would now be disregarded.

This leaves a possibility for a late-stage policy map optimization, however: If, when syncing the userspace policy map to the bpf datapath policy map, we notice that some deny entries are not covered by any wider allow entries, it would be safe to leave such entries off from the datapath policy map. We'd need to be ready to insert them before any new related allow entries are added, though.

Unit tests are adjusted, not only to include the previously missing entries, but also add "allow-all" rules for illustrative purposes so that it is easier to reason about the necessity of the added deny entries.

Fixes: #32675

Note: This is v1.16 version of #33953

@jrajahalme jrajahalme added kind/bug This is a bug in the Cilium logic. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. release-note/misc This PR makes changes that have no direct user impact. needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Jul 10, 2024
@jrajahalme jrajahalme requested a review from a team as a code owner July 10, 2024 21:36
@jrajahalme jrajahalme requested a review from derailed July 10, 2024 21:36
@jrajahalme jrajahalme marked this pull request as draft July 10, 2024 21:36
@jrajahalme
Copy link
Member Author

/test

@jrajahalme jrajahalme force-pushed the policy-deny-bug-fix branch from 9f32cb6 to 21786a9 Compare July 15, 2024 17:29
@jrajahalme
Copy link
Member Author

/test

@jrajahalme jrajahalme marked this pull request as ready for review July 15, 2024 19:54
@jrajahalme jrajahalme added the release-blocker/1.16 This issue will prevent the release of the next version of Cilium. label Jul 15, 2024
@jrajahalme
Copy link
Member Author

#33313 is merging soon, so I'll have to reintroduce the main branch fix due to the code being quite different after the merge. I'll make this PR a 1.16 specific backport instead.

@joestringer joestringer added this to the 1.16 milestone Jul 16, 2024
Copy link
Contributor

@derailed derailed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrajahalme Nice work!

@joestringer joestringer changed the base branch from main to v1.16 July 19, 2024 22:17
@joestringer joestringer requested review from a team as code owners July 19, 2024 22:17
@joestringer joestringer changed the base branch from v1.16 to main July 19, 2024 22:17
@joestringer joestringer removed request for a team July 19, 2024 22:17
@nathanjsweet
Copy link
Member

nathanjsweet commented Jul 22, 2024

@jrajahalme This is a 1.16 backport, correct? Is main unaffected because of your other PR?

Edit:
I guess it is not a backport so much as a 1.16-specific bug. The policy engine being so heavily refactored will make backports difficult going forward.

Increase test coverage by reversing insertion order of mapstate entries
in some tests.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme jrajahalme force-pushed the policy-deny-bug-fix branch from f31e7bf to a70a794 Compare July 22, 2024 18:05
@jrajahalme
Copy link
Member Author

@jrajahalme This is a 1.16 backport, correct? Is main unaffected because of your other PR?

Edit: I guess it is not a backport so much as a 1.16-specific bug. The policy engine being so heavily refactored will make backports difficult going forward.

I found this bug while working on main, and had rebase this on top of 1.16 once the other PR merged. Will post another PR to fix this in main. And yes, the fixes are structurally very different now, but the unit test changes should be analogous!

@jrajahalme
Copy link
Member Author

Added more test coverage by also testing in reverse insertion order (e.g., allow before deny) and noticed the block addressing this case was missing from the deny rule insertion side:

Also, when adding an allow entry that is a subset (in L3-wildcard or CIDR
sense) of an exising deny entry and the new allow entry has a less
specific port-protocol than the deny entry (e.g., wildcard L4 allow and
TCP deny) then an additional deny entry with the identity of the new
allow entry must be added. This ensures that the deny rule takes effect
also for the subset identity at the port/proto of the deny rule.

@jrajahalme
Copy link
Member Author

/test

@jrajahalme jrajahalme added backport/1.16 This PR represents a backport for Cilium 1.16.x of a PR that was merged to main. and removed needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Jul 22, 2024
Change denyPreferredInsertWithChanges to check to bail out without
checking for changes first. This is based on the premise that if the
entry is not added ("bailed out") then it can have no effect on the
mapstate.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Deny entries covered by another deny CIDR entry must be kept as they have
different numerical identities in the datapath. Otherwise a later added
allow-all entry could take effect if there is no specific match for the
left-off identity. Simplest example of this is with L3-only deny rules
and an Allow-all rule that wildcards also the L3. So for the sake if this
example we only need to consider L3-identities of the CIDRs (10/8, and
10.1.1.1/32, first covering the latter), say X and Y (both nonzero) and
an allow rule with ID 0 (ANY). This policy should allow all traffic that
is not with an IP covered by 10/8. Since 10.1.1.1/32 is covered by 10/8
it would be tempting to not insert the rule for the ID assigned to CIDR
10.1.1.1/32 to the datapath policy map. The identities for both CIDRs
are, however, present in the ipcache, so that traffic to (or from)
10.1.1.1 would be associated with ID Y, which would match the wildcard ID
(0): allow rule, if a deny rule with the ID Y is not present in the
datapath. Hence, we must both deny rules in to the policy map.

The above logic does not apply when the two rules in question have the
same ID (but different coverage on port and protocol), or when one of
them is a wildcard L3 rule (with ID 0). In that case we can leave the
more specific deny rule covered by another off the policy map, but only
if it was not needed to cover for a L4-only allow rule that still needs
to take effect on all L3 identities not explicitly denied. In that case
we add new L3/L4 deny entries to carve holes into the broader allow rule,
and we must keep the deny rules we added for those holes even when they
would otherwise be covered by a broader deny rule.

Note that a presence of a wider allow rule is not required at the time of
policy map entry insertion for the above consideration to take effect. In
this line of thought we could simply leave off all the deny entries if no
allow entries exist, since the datapath is deny by default. This breaks
down as soon as an allow rule is inserted to a now empty policy map
(maybe incrementally due to an FQDN policy rule):, as all the
(non-inserted): deny rules would now be disregarded.

This leaves a possibility for a late-stage policy map optimization,
however: If, when syncing the userspace policy map to the bpf datapath
policy map, we notice that some deny entries are not covered by any wider
allow entries, it would be safe to leave such entries off from the
datapath policy map. We'd need to be ready to insert them before any new
related allow entries are added, though.

Also, when adding an allow entry that is a subset (in L3-wildcard or CIDR
sense) of an exising deny entry and the new allow entry has a less
specific port-protocol than the deny entry (e.g., wildcard L4 allow and
TCP deny) then an additional deny entry with the identity of the new
allow entry must be added. This ensures that the deny rule takes effect
also for the subset identity at the port/proto of the deny rule.

Unit tests are adjusted, not only to include the previously missing
entries, but also add "allow-all" rules for illustrative purposes so that
it is easier to reason about the necessity of the added deny entries.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme jrajahalme force-pushed the policy-deny-bug-fix branch from a70a794 to e9e79ac Compare July 22, 2024 19:35
@jrajahalme
Copy link
Member Author

/test

@jrajahalme
Copy link
Member Author

/test-backport-1.16

@jrajahalme
Copy link
Member Author

main branch fix is here: #33953

@aanm aanm enabled auto-merge July 23, 2024 08:24
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 23, 2024
@aanm aanm added this pull request to the merge queue Jul 23, 2024
Merged via the queue into cilium:v1.16 with commit 317a8af Jul 23, 2024
64 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.16 This PR represents a backport for Cilium 1.16.x of a PR that was merged to main. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-blocker/1.16 This issue will prevent the release of the next version of Cilium. release-note/misc This PR makes changes that have no direct user impact. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants