-
Notifications
You must be signed in to change notification settings - Fork 3.4k
backport (v1.12): policy: Fix Deny Precedence Bug #25491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backport (v1.12): policy: Fix Deny Precedence Bug #25491
Conversation
e5fee2b
to
e329b3a
Compare
[ upstream commit 1402547 ] net.IP to netip.Addr conversion using netip.AddrFromSlice has a pitfall that it interprets an IPv4 net.IP created with net.ParseIP as IPv4 mapped IPv6 address (see code comment for more details). This helper safely converts net.IP to netip.Addr with preserving an original address family, but with assumption that given net.IP is not a "real" IPv4 mapped IPv6 address which practical enough for our use cases. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
[ upstream commit c9f0def ] - Add Tests for Deny Precedence Bug: Currently, when a broad "Deny" policy is paired with a specific "Unmanaged CIDR" policy, then the "Unmanaged CIDR" policy will still be inserted into the policy map for an endpoint. This results in "Deny" policies not always taking precedence over "Allow" policies. This test confirms the bugs existence. - Fix Deny Precedence Bug: When the policy map state is created CIDRs are now checked against one another to ensure that deny-rules that supersede allow-rules when they should. `DenyPreferredInsert` has been refactored to use utility methods that make the complex boolean logic of policy precedence more atomic. Add `NetsContainsAny` method to `pkg/ip` to compare cases where one set of networks conatins or is equal to any network in another set. - endpoint: Add policy.Identity Implementation A `policy.Identity` implementation is necessary for the incremental update to the endpoint's policy map that can occur with L7 changes. Valid deny-policy entries may prohibit these L7 changes based on CIDR rules, which are only obtainable by looking up all potentially conflicting policies' labels. Thus `l4.ToMapState` needs access to the identity allocater to lookup "random" identity labels. Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
[ upstream commit 52ace8e ] Marco Iorio reports that with previous code, Cilium could crash at runtime after importing a network policy, with the following error printed to the logs: fatal error: concurrent map read and map write The path for this issue is printed also in the logs, with the following call stack: pkg/policy.(*SelectorCache).GetLabels(...) pkg/policy.(*MapStateEntry).getNets(...) pkg/policy.entryIdentityIsSupersetOf(...) pkg/policy.MapState.denyPreferredInsertWithChanges(...) pkg/policy.MapState.DenyPreferredInsert(...) pkg/policy.(*EndpointPolicy).computeDirectionL4PolicyMapEntries(...) pkg/policy.(*EndpointPolicy).computeDesiredL4PolicyMapEntries(...) pkg/policy.(*selectorPolicy).DistillPolicy(...) pkg/policy.(*cachedSelectorPolicy).Consume(...) pkg/endpoint.(*Endpoint).regeneratePolicy(...) ... Upon further inspection, this call path is not grabbing the SelectorCache lock at any point. If we check all of the incoming calls to this function, we can see multiple higher level functions calling into this function. The following tree starts from the deepest level of the call stack and increasing indentation represents one level higher in the call stack. INCOMING CALLS - f GetLabels github.com/cilium/cilium/pkg/policy • selectorcache.go - f getNets github.com/cilium/cilium/pkg/policy • mapstate.go - f entryIdentityIsSupersetOf github.com/cilium/cilium/pkg/policy • mapstate.go - f denyPreferredInsertWithChanges github.com/cilium/cilium/pkg/policy • mapstate.go - f DenyPreferredInsert github.com/cilium/cilium/pkg/policy • mapstate.go - f computeDirectionL4PolicyMapEntries github.com/cilium/cilium/pkg/policy • resolve.go - f computeDesiredL4PolicyMapEntries github.com/cilium/cilium/pkg/policy • resolve.go + f DistillPolicy github.com/cilium/cilium/pkg/policy • resolve.go <--- No SelectorCache lock - f DetermineAllowLocalhostIngress github.com/cilium/cilium/pkg/policy • mapstate.go + f DistillPolicy github.com/cilium/cilium/pkg/policy • resolve.go <--- No SelectorCache lock - f consumeMapChanges github.com/cilium/cilium/pkg/policy • mapstate.go + f ConsumeMapChanges github.com/cilium/cilium/pkg/policy • resolve.go <--- Already locks the SelectorCache Read the above tree as "GetLabels() is called by getNets()", "getNets() is called by entryIdentityIsSupersetOf()", and so on. Siblings at the same level of indent represent alternate callers of the function that is one level of indentation less in the tree, ie DenyPreferredInsert() and consumeMapChanges() both call denyPreferredInsertWithChanges(). As annotated above, we see that calls through DistillPolicy() do not grab the SelectorCache lock. Given that ConsumeMapChanges() grabs the SelectorCache lock, we cannot introduce a new lock acquisition in any descendent function, otherwise it would introduce a deadlock in goroutines that follow that call path. This provides us the option to lock at some point from the sibling of consumeMapChanges() or higher in the call stack. Given that the ancestors of DenyPreferredInsert() are all from DistillPolicy(), we can amortize the cost of grabbing the SelectorCache lock by grabbing it once for the policy distillation phase rather than putting the lock into DenyPreferredInsert() where the SelectorCache could be locked and unlocked for each map state entry. Future work could investigate whether these call paths could make use of the IdentityAllocator's cache of local identities for the GetLabels() call rather than relying on the SelectorCache, but for now this patch should address the immediate locking issue that triggers agent crashes. CC: Nate Sweet <nathanjsweet@pm.me> Fixes: c9f0def ("policy: Fix Deny Precedence Bug") Reported-by: Marco Iorio <marco.iorio@isovalent.com> Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
[ upstream commit bfbe5a2 ] The policyIdentityLabelLookup wrapper for Endpoint implements the GetLabels interface method. This is necessary for the constructing the MapState of the policy engine. This implementation incorrectly did not check if the identity returned by LookupIdentityByID was nil. This fixes this bug, which heretofore has not caused any issues. Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
e329b3a
to
71243f3
Compare
/test-backport-1.12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the main commit (93a198d) and diff'd it against the main
version. The diffs don't seem relevant, so LGTM.
@nathanjsweet It looks like the docs are not consistent with the release note here, as they still state that deny policies are beta: https://docs.cilium.io/en/v1.12/policy/language/#id8 Should we update the docs or adjust the release note to state that a bug was fixed with deny precedence? |
I guess we can upgrade the docs to say "stable". |
ip: Add a helper to assist net.IP -> netip.Addr conversion (from @YutaroHayakawa)
net.IP to netip.Addr conversion using netip.AddrFromSlice has a pitfall
that it interprets an IPv4 net.IP created with net.ParseIP as IPv4
mapped IPv6 address (see code comment for more details). This helper
safely converts net.IP to netip.Addr with preserving an original address
family, but with assumption that given net.IP is not a "real" IPv4
mapped IPv6 address which practical enough for our use cases.
Add Tests for Deny Precedence Bug:
Currently, when a broad "Deny" policy is paired with a specific "Unmanaged CIDR" policy, then the "Unmanaged CIDR" policy will still be inserted into the policy map for an endpoint. This results in "Deny" policies not always taking precedence over "Allow" policies. This test confirms the bugs existence.
Fix Deny Precedence Bug:
When the policy map state is created CIDRs are now checked against one another to ensure that deny-rules that supersede allow-rules when they should.
DenyPreferredInsert
has been refactored to use utility methods that make the complex boolean logic of policy precedence more atomic.Add
NetsContainsAny
method topkg/ip
to compare cases where one set of networks conatins or is equal to any network in another set.endpoint: Add policy.Identity Implementation
A
policy.Identity
implementation is necessary for the incremental update to the endpoint's policy map that can occur with L7 changes. Valid deny-policy entries may prohibit these L7 changes based on CIDR rules, which are only obtainable by looking up all potentially conflicting policies' labels. Thusl4.ToMapState
needs access to the identity allocater to lookup "random" identity labels.Additional commit from @joestringer to fix concurrent access to the SelectorCache, a bug the original work introduced.
Additional commit from @nathanjsweet to prevent a potential nil pointer dereference in the endpoint code.