Skip to content

Conversation

nathanjsweet
Copy link
Member

@nathanjsweet nathanjsweet commented May 16, 2023

  • ip: Add a helper to assist net.IP -> netip.Addr conversion (from @YutaroHayakawa)

    net.IP to netip.Addr conversion using netip.AddrFromSlice has a pitfall
    that it interprets an IPv4 net.IP created with net.ParseIP as IPv4
    mapped IPv6 address (see code comment for more details). This helper
    safely converts net.IP to netip.Addr with preserving an original address
    family, but with assumption that given net.IP is not a "real" IPv4
    mapped IPv6 address which practical enough for our use cases.

  • Add Tests for Deny Precedence Bug:

    Currently, when a broad "Deny" policy is paired with a specific "Unmanaged CIDR" policy, then the "Unmanaged CIDR" policy will still be inserted into the policy map for an endpoint. This results in "Deny" policies not always taking precedence over "Allow" policies. This test confirms the bugs existence.

  • Fix Deny Precedence Bug:

    When the policy map state is created CIDRs are now checked against one another to ensure that deny-rules that supersede allow-rules when they should.

    DenyPreferredInsert has been refactored to use utility methods that make the complex boolean logic of policy precedence more atomic.

    Add NetsContainsAny method to pkg/ip to compare cases where one set of networks conatins or is equal to any network in another set.

  • endpoint: Add policy.Identity Implementation

    A policy.Identity implementation is necessary for the incremental update to the endpoint's policy map that can occur with L7 changes. Valid deny-policy entries may prohibit these L7 changes based on CIDR rules, which are only obtainable by looking up all potentially conflicting policies' labels. Thus l4.ToMapState needs access to the identity allocater to lookup "random" identity labels.

  • Additional commit from @joestringer to fix concurrent access to the SelectorCache, a bug the original work introduced.

  • Additional commit from @nathanjsweet to prevent a potential nil pointer dereference in the endpoint code.

policy: Promote Deny Policies from Beta to Stable

@nathanjsweet nathanjsweet added kind/bug This is a bug in the Cilium logic. release-note/major This PR introduces major new functionality to Cilium. kind/backports This PR provides functionality previously merged into master. backport/1.12 labels May 16, 2023
@nathanjsweet nathanjsweet requested a review from a team as a code owner May 16, 2023 21:04
@nathanjsweet nathanjsweet force-pushed the pr/nathanjsweet/backport-deny-cidr-fix-to-1-12 branch from e5fee2b to e329b3a Compare May 16, 2023 22:17
YutaroHayakawa and others added 4 commits May 16, 2023 21:47
[ upstream commit 1402547 ]

net.IP to netip.Addr conversion using netip.AddrFromSlice has a pitfall
that it interprets an IPv4 net.IP created with net.ParseIP as IPv4
mapped IPv6 address (see code comment for more details). This helper
safely converts net.IP to netip.Addr with preserving an original address
family, but with assumption that given net.IP is not a "real" IPv4
mapped IPv6 address which practical enough for our use cases.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
[ upstream commit c9f0def ]

- Add Tests for Deny Precedence Bug:

  Currently, when a broad "Deny" policy is paired with
  a specific "Unmanaged CIDR" policy, then the "Unmanaged
  CIDR" policy will still be inserted into the policy map
  for an endpoint. This results in "Deny" policies not
  always taking precedence over "Allow" policies. This
  test confirms the bugs existence.

- Fix Deny Precedence Bug:

  When the policy map state is created CIDRs
  are now checked against one another to ensure
  that deny-rules that supersede allow-rules
  when they should.

  `DenyPreferredInsert` has been refactored to
  use utility methods that make the complex boolean
  logic of policy precedence more atomic.

  Add `NetsContainsAny` method to `pkg/ip` to
  compare cases where one set of networks conatins
  or is equal to any network in another set.

- endpoint: Add policy.Identity Implementation

  A `policy.Identity` implementation is necessary
  for the incremental update to the endpoint's policy
  map that can occur with L7 changes. Valid deny-policy
  entries may prohibit these L7 changes based on CIDR
  rules, which are only obtainable by looking up all potentially
  conflicting policies' labels. Thus `l4.ToMapState` needs
  access to the identity allocater to lookup "random"
  identity labels.

Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
[ upstream commit 52ace8e ]

Marco Iorio reports that with previous code, Cilium could crash at
runtime after importing a network policy, with the following error
printed to the logs:

    fatal error: concurrent map read and map write

The path for this issue is printed also in the logs, with the following
call stack:

    pkg/policy.(*SelectorCache).GetLabels(...)
    pkg/policy.(*MapStateEntry).getNets(...)
    pkg/policy.entryIdentityIsSupersetOf(...)
    pkg/policy.MapState.denyPreferredInsertWithChanges(...)
    pkg/policy.MapState.DenyPreferredInsert(...)
    pkg/policy.(*EndpointPolicy).computeDirectionL4PolicyMapEntries(...)
    pkg/policy.(*EndpointPolicy).computeDesiredL4PolicyMapEntries(...)
    pkg/policy.(*selectorPolicy).DistillPolicy(...)
    pkg/policy.(*cachedSelectorPolicy).Consume(...)
    pkg/endpoint.(*Endpoint).regeneratePolicy(...)
    ...

Upon further inspection, this call path is not grabbing the
SelectorCache lock at any point. If we check all of the incoming calls
to this function, we can see multiple higher level functions calling
into this function. The following tree starts from the deepest level of
the call stack and increasing indentation represents one level higher in
the call stack.

INCOMING CALLS
- f GetLabels github.com/cilium/cilium/pkg/policy • selectorcache.go
  - f getNets github.com/cilium/cilium/pkg/policy • mapstate.go
    - f entryIdentityIsSupersetOf github.com/cilium/cilium/pkg/policy • mapstate.go
      - f denyPreferredInsertWithChanges github.com/cilium/cilium/pkg/policy • mapstate.go
        - f DenyPreferredInsert github.com/cilium/cilium/pkg/policy • mapstate.go
          - f computeDirectionL4PolicyMapEntries github.com/cilium/cilium/pkg/policy • resolve.go
            - f computeDesiredL4PolicyMapEntries github.com/cilium/cilium/pkg/policy • resolve.go
              + f DistillPolicy github.com/cilium/cilium/pkg/policy • resolve.go <--- No SelectorCache lock
          - f DetermineAllowLocalhostIngress github.com/cilium/cilium/pkg/policy • mapstate.go
            + f DistillPolicy github.com/cilium/cilium/pkg/policy • resolve.go <--- No SelectorCache lock
        - f consumeMapChanges github.com/cilium/cilium/pkg/policy • mapstate.go
          + f ConsumeMapChanges github.com/cilium/cilium/pkg/policy • resolve.go <--- Already locks the SelectorCache

Read the above tree as "GetLabels() is called by getNets()",
"getNets() is called by entryIdentityIsSupersetOf()", and so on.
Siblings at the same level of indent represent alternate callers of the
function that is one level of indentation less in the tree, ie
DenyPreferredInsert() and consumeMapChanges() both call
denyPreferredInsertWithChanges().

As annotated above, we see that calls through DistillPolicy() do not
grab the SelectorCache lock. Given that ConsumeMapChanges() grabs the
SelectorCache lock, we cannot introduce a new lock acquisition in any
descendent function, otherwise it would introduce a deadlock in
goroutines that follow that call path. This provides us the option to
lock at some point from the sibling of consumeMapChanges() or higher in
the call stack.

Given that the ancestors of DenyPreferredInsert() are all from
DistillPolicy(), we can amortize the cost of grabbing the SelectorCache
lock by grabbing it once for the policy distillation phase rather than
putting the lock into DenyPreferredInsert() where the SelectorCache
could be locked and unlocked for each map state entry.

Future work could investigate whether these call paths could make use of
the IdentityAllocator's cache of local identities for the GetLabels()
call rather than relying on the SelectorCache, but for now this patch
should address the immediate locking issue that triggers agent crashes.

CC: Nate Sweet <nathanjsweet@pm.me>
Fixes: c9f0def ("policy: Fix Deny Precedence Bug")
Reported-by: Marco Iorio <marco.iorio@isovalent.com>
Co-authored-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
[ upstream commit bfbe5a2 ]

The policyIdentityLabelLookup wrapper for Endpoint implements
the GetLabels interface method. This is necessary for the
constructing the MapState of the policy engine. This implementation
incorrectly did not check if the identity returned by LookupIdentityByID
was nil. This fixes this bug, which heretofore has not caused any issues.

Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
@nathanjsweet nathanjsweet force-pushed the pr/nathanjsweet/backport-deny-cidr-fix-to-1-12 branch from e329b3a to 71243f3 Compare May 17, 2023 02:55
@nathanjsweet
Copy link
Member Author

/test-backport-1.12

@nathanjsweet nathanjsweet added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. and removed ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels May 18, 2023
@nathanjsweet nathanjsweet added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 26, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 30, 2023
Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the main commit (93a198d) and diff'd it against the main version. The diffs don't seem relevant, so LGTM.

@nathanjsweet nathanjsweet added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 31, 2023
@julianwiedmann julianwiedmann merged commit a2d126f into v1.12 Jun 1, 2023
@julianwiedmann julianwiedmann deleted the pr/nathanjsweet/backport-deny-cidr-fix-to-1-12 branch June 1, 2023 10:12
@joestringer
Copy link
Member

@nathanjsweet It looks like the docs are not consistent with the release note here, as they still state that deny policies are beta:

https://docs.cilium.io/en/v1.12/policy/language/#id8

Should we update the docs or adjust the release note to state that a bug was fixed with deny precedence?

@nathanjsweet
Copy link
Member Author

I guess we can upgrade the docs to say "stable".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/major This PR introduces major new functionality to Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants