-
Notifications
You must be signed in to change notification settings - Fork 3.4k
policy: Keep deny entries when covered by another CIDR deny #33953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Increase test coverage by reversing insertion order of mapstate entries in some tests. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Refactor denyPreferredInsertWithChanges to check for cases where we can avoid inserting any changes first. This is marginally faster, and makes the next commit diff easier to read. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Deny entries covered by another deny CIDR entry must be kept as they have different numerical identities in the datapath. Otherwise a later added allow-all entry could take effect if there is no specific match for the left-off identity. Simplest example of this is with L3-only deny rules and an Allow-all rule that wildcards also the L3. So for the sake if this example we only need to consider L3-identitites of the CIDRs (10/8, and 10.1.1.1/32, first covering the latter), say X and Y (both nonzero) and an allow rule with ID 0 (ANY). This policy should allow all traffic that is not with an IP covered by 10/8. Since 10.1.1.1/32 is covered by 10/8 it would be tempting to not insert the rule for the ID assigned to CIDR 10.1.1.1/32 to the datapath policy map. The identities for both CIDRs are, however, present in the ipcache, so that traffic to (or from) 10.1.1.1 would be associated with ID Y, which would match the wildcard ID (0) allow rule, if a deny rule with the ID Y is not present in the datapath. Hence, we must have both deny rules in to the policy map. The above logic does not apply when the two rules in question have the same ID (but different coverage on port and protocol), or when one of them is a wildcard L3 rule (with ID 0). In that case we can leave the more specific deny rule covered by another off the policy map, but only if it was not needed to cover for a L4-only allow rule that still needs to take effect on all L3 identities not explicitly denied. In that case we add new L3/L4 deny entries to carve holes into the broader allow rule, and we must keep the deny rules we added for those holes even when they would otherwise be covered by a broader deny rule. Note that a presence of a wider allow rule is not required at the time of policy map entry insertion for the above consideration to take effect. In this line of thought we could simply leave off all the deny entries if no allow entries exist, since the datapath is deny dy default. This breaks down as soon as an allow rule is inserted to a now empty policy map (maybe incrementally due to an FQDN policy rule), as all the (non-inserted) deny rules would now be disregarded. This leaves a possibility for a late-stage policy map optimization, however: If, when syncing the userspace policy map to the bpf datapath policy map, we notice that some deny entries are not covered by any wider allow entries, it would be safe to leave such entries off from the datapath policy map. We'd need to be ready to insert them before any new related allow entries are added, though. Also, when adding an allow entry that is a subset (in L3-wildcard or CIDR sense) of an exising deny entry and the new allow entry has a less specific port-protocol than the deny entry (e.g., wildcard L4 allow and TCP deny) then an additional deny entry with the identity of the new allow entry must be added. This ensures that the deny rule takes effect also for the subset identity at the port/proto of the deny rule. Unit tests are adjusted, not only to include the previously missing entries, but also add "allow-all" rules for illustrative purposes so that it is easier to reason about the necessity of the added deny entries. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Make use of the allow-all rule in TestMapState_denyPreferredInsertWithSubnets explicit and test with and without it to increase test coverage. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
/test |
@nathanjsweet Note for review: Due to the way some of the tests are structured, I did not think to consider the fact that when a CIDR covers an another CIDR or a resolved FQDN IP in the policy, the identity allocated for the covered IP is also selected by the selector of the covering CIDR. For example: Policy rules with selectors:
Reserved and allocated numeric identities with labels for CIDRs in the policy:
Numeric identities matched by each selector BEFORE DNS resolution for
Allocated numeric identities with labels for FQDNs in the policy AFTER DNS resolution via Cilium DNS proxy:
Numeric identities matched AFTER DNS resolution by each selector:
This means that when adding the deny rule, keys with both identities Once we make selector cache transactional, i.e., identity |
With the policy as shown by [1] applied, traffic to 1.1.1.2/32 would be denied but only because of the default-deny fallback, and not have a policy correlation. In Hubble, this would show up as `policy-verdict:none` in the flow instead of "Policy denied by denylist". In the denyPreferredInsertWithChanges() logic for v1.15 and below (v1.16 and main unaffected), the "deny-deny" [2] logic removed what the policy engine considered a redundant deny entry that leads to the missing policy correlation described above. The fix is to minimally cherry-pick 317a8af ("policy: Keep deny entries when covered by another CIDR deny") [from the v1.16 tree]. This cherry-picked commit is built off of a2ff24f ("policy: Fix Deny Insertion Logic"), which was found by bisecting v1.15 and v1.16 to identify when this bug was "fixed" in the later versions of Cilium. In addition, this commit contains another minimal cherry-pick from 317a8af ("policy: Keep deny entries when covered by another CIDR deny") [from the v1.16 tree] for the "deny-allow" [3] and "allow-deny" [4] case. [3] fixes a transient bug as explained by #33953 (comment). For more context on the changes to the policy engine between versions, tracing the commit history from the commits mentioned above would be the best way. [1]: ``` apiVersion: cilium.io/v2 kind: CiliumNetworkPolicy metadata: name: allow spec: egress: - toCIDR: - 1.1.1.1/32 - 1.1.1.2/32 endpointSelector: {} --- apiVersion: cilium.io/v2 kind: CiliumNetworkPolicy metadata: name: deny spec: egressDeny: - toCIDRSet: - cidr: 1.0.0.0/8 except: - 1.1.1.1/32 endpointSelector: {} ``` [2]: "Deny-deny" refers to the policy engine code path inside denyPreferredInsertWithChanges() under ``` if newEntry.IsDeny { ... ms.ForEachDeny() ... } ``` [3]: Similar to [2], "deny-allow" refers to ``` if newEntry.IsDeny { ... ms.ForEachAllow() ... } ``` [4]: Similar to [3], "allow-deny" refers to (note the ! in the condition) ``` if !newEntry.IsDeny { ... ms.ForEachDeny() ... } ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
With the policy as shown by [1] applied, traffic to 1.1.1.2/32 would be denied but only because of the default-deny fallback, and not have a policy correlation. In Hubble, this would show up as `policy-verdict:none` in the flow instead of "Policy denied by denylist". In the denyPreferredInsertWithChanges() logic for v1.15 and below (v1.16 and main unaffected), the "deny-deny" [2] logic removed what the policy engine considered a redundant deny entry that leads to the missing policy correlation described above. The fix is to minimally cherry-pick 317a8af ("policy: Keep deny entries when covered by another CIDR deny") [from the v1.16 tree]. This cherry-picked commit is built off of a2ff24f ("policy: Fix Deny Insertion Logic"), which was found by bisecting v1.15 and v1.16 to identify when this bug was "fixed" in the later versions of Cilium. In addition, this commit contains another minimal cherry-pick from 317a8af ("policy: Keep deny entries when covered by another CIDR deny") [from the v1.16 tree] for the "deny-allow" [3] and "allow-deny" [4] case. [3] fixes a transient bug as explained by #33953 (comment). For more context on the changes to the policy engine between versions, tracing the commit history from the commits mentioned above would be the best way. [1]: ``` apiVersion: cilium.io/v2 kind: CiliumNetworkPolicy metadata: name: allow spec: egress: - toCIDR: - 1.1.1.1/32 - 1.1.1.2/32 endpointSelector: {} --- apiVersion: cilium.io/v2 kind: CiliumNetworkPolicy metadata: name: deny spec: egressDeny: - toCIDRSet: - cidr: 1.0.0.0/8 except: - 1.1.1.1/32 endpointSelector: {} ``` [2]: "Deny-deny" refers to the policy engine code path inside denyPreferredInsertWithChanges() under ``` if newEntry.IsDeny { ... ms.ForEachDeny() ... } ``` [3]: Similar to [2], "deny-allow" refers to ``` if newEntry.IsDeny { ... ms.ForEachAllow() ... } ``` [4]: Similar to [3], "allow-deny" refers to (note the ! in the condition) ``` if !newEntry.IsDeny { ... ms.ForEachDeny() ... } ``` Signed-off-by: Chris Tarazi <chris@isovalent.com>
Deny entries covered by another deny CIDR entry must be kept as they have different numerical identities in the datapath. Otherwise a later added allow-all entry could take effect if there is no specific match for the left-off identity. Simplest example of this is with L3-only deny rules and an Allow-all rule that wildcards also the L3. So for the sake if this
example we only need to consider L3-identities of the CIDRs (10/8, and 10.1.1.1/32, first covering the latter), say X and Y (both nonzero) and an allow rule with ID 0 (ANY). This policy should allow all traffic that is not with an IP covered by 10/8. Since 10.1.1.1/32 is covered by 10/8 it would be tempting to not insert the rule for the ID assigned to CIDR 10.1.1.1/32 to the datapath policy map. The identities for both CIDRs are, however, present in the ipcache, so that traffic to (or from)
10.1.1.1 would be associated with ID Y, which would match the wildcard ID (0) allow rule, if a deny rule with the ID Y is not present in the datapath. Hence, we must have both deny rules in to the policy map.
The above logic does not apply when the two rules in question have the same ID (but different coverage on port and protocol), or when one of them is a wildcard L3 rule (with ID 0). In that case we can leave the more specific deny rule covered by another off the policy map, but only if it was not needed to cover for a L4-only allow rule that still needs
to take effect on all L3 identities not explicitly denied. In that case we add new L3/L4 deny entries to carve holes into the broader allow rule, and we must keep the deny rules we added for those holes even when they would otherwise be covered by a broader deny rule.
Note that a presence of a wider allow rule is not required at the time of policy map entry insertion for the above consideration to take effect. In this line of thought we could simply leave off all the deny entries if no allow entries exist, since the datapath is deny by default. This breaks down as soon as an allow rule is inserted to a now empty policy map
(maybe incrementally due to an FQDN policy rule), as all the (non-inserted) deny rules would now be disregarded.
This leaves a possibility for a late-stage policy map optimization, however: If, when syncing the userspace policy map to the bpf datapath policy map, we notice that some deny entries are not covered by any wider allow entries, it would be safe to leave such entries off from the datapath policy map. We'd need to be ready to insert them before any new related allow entries are added, though.
Also, when adding an allow entry that is a subset (in L3-wildcard or CIDR sense) of an existing deny entry and the new allow entry has a less specific port-protocol than the deny entry (e.g., wildcard L4 allow and TCP deny) then an additional deny entry with the identity of the new allow entry must be added. This ensures that the deny rule takes effect also for the subset identity at the port/proto of the deny rule.
Unit tests are adjusted, not only to include the previously missing entries, but also add "allow-all" rules for illustrative purposes so that it is easier to reason about the necessity of the added deny entries.
Fixes: #32675
Note: This is the
main
version of #33719