-
Notifications
You must be signed in to change notification settings - Fork 3.4k
backport (v1.11): policy: Fix Deny Precedence Bug #25496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backport (v1.11): policy: Fix Deny Precedence Bug #25496
Conversation
We'll need to upgrade golang (currently 1.17) for the v1.11 branch as we need the net/netip package which was released in go 1.18. |
depends on #25513. |
Let's take a step back. Couldn't we replace the |
Yeah, I thought its usage was more pervasive in this PR, but it's completely unnecessary. |
a7ce13e
to
0e8ea99
Compare
👍 Please close the corresponding backport PR to bump Go so that the backporter / triager doesn't spend time on it. |
fe0b99b
to
12c4609
Compare
Sorry for the spam. This backport is turning out to be trickier than I anticipated. |
[ upstream commit c9f0def ] - Add Tests for Deny Precedence Bug: Currently, when a broad "Deny" policy is paired with a specific "Unmanaged CIDR" policy, then the "Unmanaged CIDR" policy will still be inserted into the policy map for an endpoint. This results in "Deny" policies not always taking precedence over "Allow" policies. This test confirms the bugs existence. - Fix Deny Precedence Bug: When the policy map state is created CIDRs are now checked against one another to ensure that deny-rules that supersede allow-rules when they should. `DenyPreferredInsert` has been refactored to use utility methods that make the complex boolean logic of policy precedence more atomic. Add `NetsContainsAny` method to `pkg/ip` to compare cases where one set of networks conatins or is equal to any network in another set. - endpoint: Add policy.Identity Implementation A `policy.Identity` implementation is necessary for the incremental update to the endpoint's policy map that can occur with L7 changes. Valid deny-policy entries may prohibit these L7 changes based on CIDR rules, which are only obtainable by looking up all potentially conflicting policies' labels. Thus `l4.ToMapState` needs access to the identity allocater to lookup "random" identity labels. Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
[ upstream commit 52ace8e ] Marco Iorio reports that with previous code, Cilium could crash at runtime after importing a network policy, with the following error printed to the logs: fatal error: concurrent map read and map write The path for this issue is printed also in the logs, with the following call stack: pkg/policy.(*SelectorCache).GetLabels(...) pkg/policy.(*MapStateEntry).getNets(...) pkg/policy.entryIdentityIsSupersetOf(...) pkg/policy.MapState.denyPreferredInsertWithChanges(...) pkg/policy.MapState.DenyPreferredInsert(...) pkg/policy.(*EndpointPolicy).computeDirectionL4PolicyMapEntries(...) pkg/policy.(*EndpointPolicy).computeDesiredL4PolicyMapEntries(...) pkg/policy.(*selectorPolicy).DistillPolicy(...) pkg/policy.(*cachedSelectorPolicy).Consume(...) pkg/endpoint.(*Endpoint).regeneratePolicy(...) ... Upon further inspection, this call path is not grabbing the SelectorCache lock at any point. If we check all of the incoming calls to this function, we can see multiple higher level functions calling into this function. The following tree starts from the deepest level of the call stack and increasing indentation represents one level higher in the call stack. INCOMING CALLS - f GetLabels github.com/cilium/cilium/pkg/policy • selectorcache.go - f getNets github.com/cilium/cilium/pkg/policy • mapstate.go - f entryIdentityIsSupersetOf github.com/cilium/cilium/pkg/policy • mapstate.go - f denyPreferredInsertWithChanges github.com/cilium/cilium/pkg/policy • mapstate.go - f DenyPreferredInsert github.com/cilium/cilium/pkg/policy • mapstate.go - f computeDirectionL4PolicyMapEntries github.com/cilium/cilium/pkg/policy • resolve.go - f computeDesiredL4PolicyMapEntries github.com/cilium/cilium/pkg/policy • resolve.go + f DistillPolicy github.com/cilium/cilium/pkg/policy • resolve.go <--- No SelectorCache lock - f DetermineAllowLocalhostIngress github.com/cilium/cilium/pkg/policy • mapstate.go + f DistillPolicy github.com/cilium/cilium/pkg/policy • resolve.go <--- No SelectorCache lock - f consumeMapChanges github.com/cilium/cilium/pkg/policy • mapstate.go + f ConsumeMapChanges github.com/cilium/cilium/pkg/policy • resolve.go <--- Already locks the SelectorCache Read the above tree as "GetLabels() is called by getNets()", "getNets() is called by entryIdentityIsSupersetOf()", and so on. Siblings at the same level of indent represent alternate callers of the function that is one level of indentation less in the tree, ie DenyPreferredInsert() and consumeMapChanges() both call denyPreferredInsertWithChanges(). As annotated above, we see that calls through DistillPolicy() do not grab the SelectorCache lock. Given that ConsumeMapChanges() grabs the SelectorCache lock, we cannot introduce a new lock acquisition in any descendent function, otherwise it would introduce a deadlock in goroutines that follow that call path. This provides us the option to lock at some point from the sibling of consumeMapChanges() or higher in the call stack. Given that the ancestors of DenyPreferredInsert() are all from DistillPolicy(), we can amortize the cost of grabbing the SelectorCache lock by grabbing it once for the policy distillation phase rather than putting the lock into DenyPreferredInsert() where the SelectorCache could be locked and unlocked for each map state entry. Future work could investigate whether these call paths could make use of the IdentityAllocator's cache of local identities for the GetLabels() call rather than relying on the SelectorCache, but for now this patch should address the immediate locking issue that triggers agent crashes. CC: Nate Sweet <nathanjsweet@pm.me> Fixes: c9f0def ("policy: Fix Deny Precedence Bug") Reported-by: Marco Iorio <marco.iorio@isovalent.com> Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
[ upstream commit bfbe5a2 ] The policyIdentityLabelLookup wrapper for Endpoint implements the GetLabels interface method. This is necessary for the constructing the MapState of the policy engine. This implementation incorrectly did not check if the identity returned by LookupIdentityByID was nil. This fixes this bug, which heretofore has not caused any issues. Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
12c4609
to
3351bc6
Compare
/test-backport-1.11 Job 'Cilium-PR-K8s-1.18-kernel-4.9' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.18-kernel-4.9/2725/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Then please upload the Jenkins artifacts to that issue. |
/test-1.18-4.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main commit looks good to me. No major diffs of concern between backport and main
version.
Add Tests for Deny Precedence Bug:
Currently, when a broad "Deny" policy is paired with a specific "Unmanaged CIDR" policy, then the "Unmanaged CIDR" policy will still be inserted into the policy map for an endpoint. This results in "Deny" policies not always taking precedence over "Allow" policies. This test confirms the bugs existence.
Fix Deny Precedence Bug:
When the policy map state is created CIDRs are now checked against one another to ensure that deny-rules that supersede allow-rules when they should.
DenyPreferredInsert
has been refactored to use utility methods that make the complex boolean logic of policy precedence more atomic.Add
NetsContainsAny
method topkg/ip
to compare cases where one set of networks conatins or is equal to any network in another set.endpoint: Add policy.Identity Implementation
A
policy.Identity
implementation is necessary for the incremental update to the endpoint's policy map that can occur with L7 changes. Valid deny-policy entries may prohibit these L7 changes based on CIDR rules, which are only obtainable by looking up all potentially conflicting policies' labels. Thusl4.ToMapState
needs access to the identity allocater to lookup "random" identity labels.Additional commit from @joestringer to fix concurrent access to the SelectorCache, a bug the original work introduced.
Additional commit from @nathanjsweet to prevent a potential nil pointer dereference in the endpoint code.