Skip to content

Conversation

jrajahalme
Copy link
Member

@jrajahalme jrajahalme commented Feb 20, 2023

Change to use LPM trie map for endpoint policy, as this allows prefix based wildcarding of protocol and port, when placed at the end of the map key. This helps eliminate two policy map lookups, which should help offset the performance penalty of a more complex map lookup.

This involves both changing the map type, and extending the policymap key with the prefix length field that is customary to all LPM maps. At the same time we need to reorganise the key fields so that the fields we plan to wildcard are at the end of the key and in the order of the port being the last field and protocol before it, as it can be wildcarded also when the protocol is not wildcarded, but the protocol can be wildcarded only if the port is also (completely) wildcarded.

With this new LPM policy map we still need a second lookup with an explicitly wildcarded security ID, as it can be wildcarded also when protocol and/or port are being matched.

As Cilium agent can re-populate policy maps from scratch there is no need for any specific upgrade/downgrade support for the map type and key layout change. Old programs keep using the old policy maps they were started with, while new programs for all endpoints upon the start of the upgraded Cilium agent will the new maps when they are loaded.

This change is a prerequisite for port range support in network policy.

@jrajahalme jrajahalme added kind/enhancement This would improve or streamline existing functionality. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. labels Feb 20, 2023
@jrajahalme jrajahalme requested review from a team as code owners February 20, 2023 12:31
@jrajahalme jrajahalme requested a review from aditighag February 20, 2023 12:31
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Feb 20, 2023
@jrajahalme jrajahalme requested review from thorn3r and squeed February 20, 2023 12:31
@jrajahalme jrajahalme marked this pull request as draft February 20, 2023 12:31
@jrajahalme jrajahalme force-pushed the datapth-use-lpm-policy-maps branch from 79b2706 to a862cf9 Compare February 20, 2023 12:58
@jrajahalme jrajahalme marked this pull request as ready for review February 20, 2023 12:59
@jrajahalme jrajahalme added the release-note/misc This PR makes changes that have no direct user impact. label Feb 20, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 20, 2023
@jrajahalme
Copy link
Member Author

jrajahalme commented Feb 20, 2023

/test

Job 'Cilium-PR-K8s-1.24-kernel-5.4' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Found 15 k8s-app=cilium logs matching list of errors that must be investigated:

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.24-kernel-5.4 so I can create one.

Job 'Cilium-PR-K8s-1.16-kernel-4.19' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Found 15 k8s-app=cilium logs matching list of errors that must be investigated:

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.16-kernel-4.19 so I can create one.

Copy link
Contributor

@thorn3r thorn3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agent bits LGTM, but i see some CI failures that look maybe relevant to these changes:

Tests upgrade and downgrade from a Cilium stable image to master:
Key-size mismatch for BPF map" file-path=/sys/fs/bpf/tc/globals/cilium_policy_00835 new=8 old=12 subsys=bpf

Comment on lines 164 to 165
} else if prefix > 8 {
port = fmt.Sprintf("0x%x/%d/%s", dport, prefix-8, proto.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we care here about values > 24? i.e. are they valid, or are we using prefixlen to bound the value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understood, 24 = 16+8, i.e. it's the maximum length for the port and protocol fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats how it seems to me as well. i guess my question could be rephrased a bit clearer as:
is it possible to encounter a value > 24 that can cause some unexpected behavior here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added safety checks, thanks!

Copy link
Contributor

@gentoo-root gentoo-root left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also a couple of questions, sorry in advance if they are too stupid, I'm new to this code area.

Comment on lines 164 to 165
} else if prefix > 8 {
port = fmt.Sprintf("0x%x/%d/%s", dport, prefix-8, proto.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understood, 24 = 16+8, i.e. it's the maximum length for the port and protocol fields.

@github-actions
Copy link

github-actions bot commented Apr 1, 2023

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. and removed stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. labels Apr 1, 2023
@jrajahalme jrajahalme force-pushed the datapth-use-lpm-policy-maps branch from a862cf9 to e49e7b3 Compare April 14, 2023 01:23
@jrajahalme jrajahalme requested a review from gentoo-root April 14, 2023 01:25
@jrajahalme
Copy link
Member Author

jrajahalme commented Apr 20, 2023

Also, please mention in the code why the port and protocol fields need to be at the end of the key struct.

Added comment right before the moved protocol and dport fields:

	/*
	 * 'protocol' and 'port' can be wildcarded as indicated by 'lpm_key',
	 * hence they need to be at the end of the key. The order of these fields is
	 * significant as protocol can only be wildcarded if dport is also fully
	 * wildcarded. 'protocol' is never partially wildcarded, so it is either fully
	 * wildcarded or not wildcarded at all. 'dport' can be partially wildcarded.
	 */

(will push this change once the current CI run finishes)

@jrajahalme
Copy link
Member Author

This commit is missing a lot of context. The commit shuffles some of the fields in policy key around. So how do we ensure that the datapath is in sync with the control plane state? I would've expected to see some map versioning code where the agent would populate the v2 policy map with the new key layout, and then atomically swap the old map. If the commit takes a different approach, then it needs to be explicitly called out in the commit description and code.

A bpf program will keep on using the (policy) map it was loaded with, even when the userspace creates a new map on the same path for the upgrade. The new map is then only used by the new bpf program once it is loaded. This means that versioning does not need to be explicit, and a given version of Cilium agent does not need to understand the (policy) map layout of another version.

@jrajahalme
Copy link
Member Author

/test

@jrajahalme
Copy link
Member Author

@aditighag Elaborated on the LPM comment a bit, now it is placed before the key definition and reads:

/*
 * Longest-prefix match map lookup only matches the number of bits from the
 * beginning of the key stored in the map indicated by the 'lpm_key' field in
 * the same stored map key, not including the 'lpm_key' field itself. Note that
 * the 'lpm_key' value passed in the lookup function argument needs to be a
 * "full prefix" (POLICY_FULL_PREFIX defined below).
 * 
 * Since we need to be able to wildcard 'sec_label' independently on 'protocol'
 * and 'dport' fields, we'll need to do that explicitly with a separate lookup
 * where 'sec_label' is zero. For the 'protocol' and 'port' we can use the
 * longest-prefix match by placing them at the end ot the key in this specific
 * order, as we want to be able to wildcard those fields in a specific pattern:
 * 'protocol' can only be wildcarded if dport is also fully wildcarded.
 * 'protocol' is never partially wildcarded, so it is either fully wildcarded or
 * not wildcarded at all. 'dport' can be partially wildcarded, but only when
 * 'protocol' is fully specified. This follows the logic that the destination
 * port is a property of a transport protocol and can not be specified without
 * also specifying the protocol.
 */

I also moved the definition of POLICY_FULL_PREFIX to be right after the key definition to keep them together.

@jrajahalme jrajahalme force-pushed the datapth-use-lpm-policy-maps branch from 6916c92 to 0142baf Compare April 20, 2023 18:03
@jrajahalme jrajahalme requested a review from aditighag April 20, 2023 18:03
@jrajahalme
Copy link
Member Author

Does that mean all users will get a non-avoidable, non-actionable warning on upgrades and downgrades? 😬

For now, yes, until bpf.Map is refactored to let the caller do the warning only when needed.

@jrajahalme jrajahalme force-pushed the datapth-use-lpm-policy-maps branch from 0142baf to 04fd126 Compare April 20, 2023 19:06
@jrajahalme
Copy link
Member Author

Removed trailing whitespace.

@jrajahalme
Copy link
Member Author

/test

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding detailed comments around the port and protocol fields.
Re: map layout change: I suppose the agent populates a new map based on the revised key/value entries, and then swaps out the old map? While this PR doesn't change the logic explicitly, it does change the map type in addition to map key layout.
(The context needs to go in the PR/commit description. )

* order, as we want to be able to wildcard those fields in a specific pattern:
* 'protocol' can only be wildcarded if dport is also fully wildcarded.
* 'protocol' is never partially wildcarded, so it is either fully wildcarded or
* not wildcarded at all. 'dport' can be partially wildcarded, but only when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does partially wildcarded mean? Does it indicate a port range?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is a prerequisite for implementing port ranges, yes.

Reduce confusion by always keeping the policymap key and value in network
byteorder.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Use PolicyEntry level accessor for IsDeny() so that the callers do not need to
know about the flags explicitly.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Do not export flags definition, simplify via using custom type instead of
depending on conversions to uint8.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Use string for protocol in policymap key conversion to disambiguate port
and protocol numbers, and always include it in the string.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Change to use LPM trie map for endpoint policy, as this allows prefix
based wildcarding of protocol and port, when placed at the end of the map
key. This helps eliminate two policy map lookups, which should help
offset the performance penalty of a more complex map lookup.

This change is a prerequisite for port range support in network policy.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme jrajahalme force-pushed the datapth-use-lpm-policy-maps branch from 04fd126 to 42fc378 Compare April 24, 2023 11:11
@jrajahalme jrajahalme requested a review from aditighag April 24, 2023 11:11
Add policymap unit testing to validate these policymap key & value invariants:
- protocol/nexthdr can only be wildcarded if destination port is wildcarded
- value has wildcardNexthdr flag set if and only if key wildcards Nexthdr
- value has wildcardDestPort flag set if and only if key wildcards DestPort
- deny entries have no redirection nor auth type

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
…rate

bpf unit tests failed with a null pointer access verifier error (R1
invalid mem access 'inv'). Apparently the assignment of 'l4policy' to
'policy' was confusing, so separate the code path for 'l4policy' to keep
verifier happy.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Do not fail tests for policy map key or value size mismatches, as in case
of policy maps the maps are regenerated from the agent without any help
from the datapath.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme jrajahalme force-pushed the datapth-use-lpm-policy-maps branch from 42fc378 to 6e93afb Compare April 24, 2023 11:20
@jrajahalme
Copy link
Member Author

jrajahalme commented Apr 24, 2023

/test

Job 'Cilium-PR-K8s-1.27-kernel-net-next' failed:

Click to show.

Test Name

K8sDatapathConfig Host firewall With VXLAN and endpoint routes

Failure Output

FAIL: Failed to reach 192.168.56.12:80 from testclient-nnjzn

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.27-kernel-net-next/126/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.27-kernel-net-next so I can create one.

Then please upload the Jenkins artifacts to that issue.

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Policy engine changes look good for my code owners. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement This would improve or streamline existing functionality. release-note/misc This PR makes changes that have no direct user impact. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants