-
Notifications
You must be signed in to change notification settings - Fork 3.4k
bpf: Policy map precedence by LPM prefix length #35150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf: Policy map precedence by LPM prefix length #35150
Conversation
28d4fb6
to
5c8007f
Compare
5c8007f
to
7817265
Compare
/test |
7817265
to
4d46d86
Compare
/test |
I don't feel like I fully get the idea, could you clarify please?
Meaning that 1 is preferred over 2? Is that not expected according to the comment in the code?
I.e. 2 will be preferred over 1? What's the motivation of this change? |
4d46d86
to
5824824
Compare
/test |
5824824
to
6e60cfd
Compare
There is an unfortunate markdown rendering effect here, the source has "<ID>", which is rendered as an empty string. Note that the comment says "L3 specified".
1 is preferred over 2 due to this:
This happens because the condition used for 1. evaluates true due to
Datapath only ever has to choose between two rules, one with full L3 identity match, other with L3 identity wildcarded (= 0). This change makes sure the policy with more specific proto/port is chosen out of these two. The lookup for these two cases is already doing that (among the matching L3 identity and wildcard L3 identity) due to the policy map being LPM. |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Consider following two policy map state entries: 1. Allow <ID>/TCP/80/15 -- L3 <ID> specified, TCP port 80 partially wildcarded (ports 80-81) 2. Deny */TCP/80/16 -- L3 wildcarded, TCP port 80 not wildcarded When the deny entry is added, Cilium adds an additional deny entry: 3. Deny <ID>/TCP/80/16 -- L3 <ID> specified, TCP port 80 not wildcarded This makes sure that the allow rule is not given precedence. This is currently needed due to the logic datapath uses to select between the L3-specific and L4-only entries when both are found. There is no such additional entry logic in the allow/allow case, so we rely on the datapath choosing the "right" entry. Eg., if the more specific L4-only policy entry has L7 policy, and the less specific L3 policy not, then the datapath should select the more specific entry and redirect the traffic to the L7 proxy for policy enforcement. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Policy map key can have nexthdr and destination port fields wildcarded. Cilium datapath typically performs two policy map lookups, one with the L3 identity mapped from the relevant IP address (destination or source IP for egress or ingress policy, respectively). Datapath chooses between the two policy entries based on two booleans in the policy entry: 'wildcarded_protocol' and 'wildcarded_dport'. The problem is that 'wildcarded_dport' is set only if the destination port is fully wildcarded, so partially wildcarded port is considered as "not wildcarded" for this logic. Fix this for partially wildcarded ports by always choosing between L4-only and L3 policy matches by the key prefix length so that the most specific entry is always selected, selecting the entry with a precific identity if the prefix lengths are the same. That is, choose the policy with specific L3 if the prefixes are the same length, otherwise choose the policy with the longer prefix. LPM map key already contains the prefix length, but unfortunately LPM map lookup does not return the map key. This is why we store an 8-bit version of the prefix length in the policy map entry, which is as follows: 0: both nexthdr and destination port fully wildcarded 8: nexthdr specified but port fully wildcarded 24: both nexthdr and destination port specified (no wildcarding) Values between 8 and 24 represent the less or more wildcarded destination port. Previously used wildcarded_protocol and wildcarded_dport flags are removed as redundant. Before this change the precedence between the following two rules: 1. <ID>/TCP/80/15 -- L3 <ID> specified, TCP port 80 partially wildcarded (ports 80-81) 2. */TCP/80/16 -- L3 wildcarded, TCP port 80 not wildcarded would have resulted in selecting the policy with <ID> specified even when it's destination port is partially wildcarded, while the L4-only rule has no wildcarding in it's destination port. If the entries above were allow (1) and deny (3), Cilium agent masked this behavior by adding an additional deny entry in this case: 3. <ID>/TCP/80/16 -- L3 <ID> specified, TCP port 80 not wildcarded This masks the incorrect behavior for allow/deny case, since the non-wildcarded key (3) is given precedence by the datapath. There is no such additional entry logic in the allow/allow case, so if the more specific L4-only policy entry has L7 policy, and the less specific L3 policy not, then matching traffic would not be sent to the L7 proxy for policy enforcement. After this change the policy with less L4 wildcarding (the more specific L4) is given precedence, with L3 rule being selected when prefixes are of the same length. This allows for simplifying policy map computation by avoiding the need to add additional entries in cases where deny already has the longer prefix length. This change makes no difference for keys which only have the destination port fully specified or fully wildcarded, so this only affects policies with port ranges as enabled in Cilium 1.16. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
6e60cfd
to
6861644
Compare
/test |
Move the prefix length field in the policy map value to the flags (same byte as 'deny') to a previously unused space. This cleans up padding space for future use. Skip map entries with inconsistent prefix length field in value vs. key so that the entry will be rewritten with the correct prefix length field in the value. This fixes a potential issue in upgrade where a policy map entry may remain with zero valued prefix length in value leading to incorrect policy enforcement. Fixes: cilium#35150 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Move the prefix length field in the policy map value to the flags (same byte as 'deny') to a previously unused space. This cleans up padding space for future use. Skip map entries with inconsistent prefix length field in value vs. key so that the entry will be rewritten with the correct prefix length field in the value. This fixes a potential issue in upgrade where a policy map entry may remain with zero valued prefix length in value leading to incorrect policy enforcement. Fixes: cilium#35150 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Move the prefix length field in the policy map value to the flags (same byte as 'deny') to a previously unused space. This cleans up padding space for future use. Skip map entries with inconsistent prefix length field in value vs. key so that the entry will be rewritten with the correct prefix length field in the value. This fixes a potential issue in upgrade where a policy map entry may remain with zero valued prefix length in value leading to incorrect policy enforcement. Fixes: cilium#35150 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Move the prefix length field in the policy map value to the flags (same byte as 'deny') to a previously unused space. This cleans up padding space for future use. Skip map entries with inconsistent prefix length field in value vs. key so that the entry will be rewritten with the correct prefix length field in the value. This fixes a potential issue in upgrade where a policy map entry may remain with zero valued prefix length in value leading to incorrect policy enforcement. Fixes: #35150 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
[ upstream commit 60bc8fa ] Move the prefix length field in the policy map value to the flags (same byte as 'deny') to a previously unused space. This cleans up padding space for future use. Skip map entries with inconsistent prefix length field in value vs. key so that the entry will be rewritten with the correct prefix length field in the value. This fixes a potential issue in upgrade where a policy map entry may remain with zero valued prefix length in value leading to incorrect policy enforcement. Fixes: cilium#35150 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
[ upstream commit 60bc8fa ] Move the prefix length field in the policy map value to the flags (same byte as 'deny') to a previously unused space. This cleans up padding space for future use. Skip map entries with inconsistent prefix length field in value vs. key so that the entry will be rewritten with the correct prefix length field in the value. This fixes a potential issue in upgrade where a policy map entry may remain with zero valued prefix length in value leading to incorrect policy enforcement. Fixes: #35150 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Policy map key can have nexthdr and destination port fields wildcarded. Cilium datapath typically performs two policy map lookups, one with the L3 identity mapped from the relevant IP address (destination or source IP for egress or ingress policy, respectively). Datapath chooses between the two policy entries based on two booleans in the policy entry: 'wildcarded_protocol' and 'wildcarded_dport'. The problem is that 'wildcarded_dport' is set only if the destination port is fully wildcarded, so partially wildcarded port is considered as "not wildcarded" for this logic.
Fix this for partially wildcarded ports by always choosing between L4-only and L3 policy matches by the key prefix length so that the most specific entry is always selected, selecting the entry with a specific identity if the prefix lengths are the same. That is, choose the policy with specific L3 if the prefixes are the same length, otherwise choose the policy with the longer prefix.
LPM map key already contains the prefix length, but unfortunately LPM map lookup does not return the map key. This is why we store an 8-bit version of the prefix length in the policy map entry, which is as follows:
0: both nexthdr and destination port fully wildcarded
8: nexthdr specified but port fully wildcarded
24: both nexthdr and destination port specified (no wildcarding)
Values between 8 and 24 represent the less or more wildcarded destination port.
Previously used wildcarded_protocol and wildcarded_dport flags are removed as redundant.
Before this change the precedence between the following two rules:
would have resulted in selecting the policy with <ID> specified even when it's destination port is partially wildcarded, while the L4-only rule has no wildcarding in it's destination port.
If the entries above were allow (1) and deny (3), Cilium agent masked this behavior by adding an additional deny entry in this case:
This masks the incorrect behavior for allow/deny case, since the non-wildcarded key (3) is given precedence by the datapath.
There is no such additional entry logic in the allow/allow case, so if the more specific L4-only policy entry has L7 policy, and the less specific L3 policy not, then matching traffic would not be sent to the L7 proxy for policy enforcement.
After this change the policy with less L4 wildcarding (the more specific L4) is given precedence, with L3 rule being selected when prefixes are of the same length. This allows for simplifying policy map computation by avoiding the need to add additional entries in cases where deny already has the longer prefix length.
This change makes no difference for keys which only have the destination port fully specified or fully wildcarded, so this only affects policies with port ranges as enabled in Cilium 1.16.
Removal of the additional deny entries enabled by this change make
BenchmarkRegenerateCIDRDenyPolicyRules
~10% faster due to reduced allocations.Fixes: #23885