Skip to content

Conversation

gandro
Copy link
Member

@gandro gandro commented Feb 29, 2024

This PR moves the logic which translates ToServices rules to ToCIDRSet rules into pkg/policy/k8s. In doing so, we also refactor how translation is performed:

Previously, translation was split over multiple packages: Initial policy ingestion (in pkg/policy) invoked k8s.PreprocessRules, whichpopulated the initial version of the policy with ToCIDRSet rules. Service updates were handled incrementally by the service watcher, which invoked a k8s.RuleTranslator for each update and allocating CIDR
identities (via IPCache) accordingly.

The new approach centralizes all the logic into pkg/policy/k8s. We mirror the approach from the existing CiliumCIDRGroup logic which also translates references to CiliumCIDRGroups to ToCIDRSets. Because
the new approach modifies the CNP/CCNP before it is added to the policy repository, we no longer perform incremental updates and no longer need manage CIDR identities ourselves. This simplifies the code and allows us to remove one of the last users of the "synchronous IPCache API".

Note: This PR removes the last users of the synchronous CIDR allocation API. I will remove those functions (AllocateCIDRs/ReleaseCIDRIdentitiesByCIDR) in a separate PR since this PR is already on the larger side.

As a side-effect of this change, ToServices rules for policies ingested via the non-persistent API (i.e. cilium-dbg policy import) are no longer supported. We assume that this was not a commonly used feature for two reasons: First, importing non-K8s policies that refer K8s resources (i.e. services) is likely rather rare. And secondly, the previously implementation was slightly broken: Because policies imported via API were not preprocessed, the policy was defunct until one of the
matched services generated an update event.

Review per commit.

Fixes: #30322

@gandro gandro added sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. release-note/misc This PR makes changes that have no direct user impact. labels Feb 29, 2024
@gandro gandro force-pushed the pr/gandro/k8s-policy-refactor-to-services branch 2 times, most recently from 21cb64f to bff4c69 Compare February 29, 2024 14:11
@gandro
Copy link
Member Author

gandro commented Feb 29, 2024

/test

@gandro gandro marked this pull request as ready for review February 29, 2024 15:54
@gandro gandro requested review from a team as code owners February 29, 2024 15:54
@gandro gandro requested review from squeed and learnitall February 29, 2024 15:54
Copy link
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! Thank you so much for all of the detail in your commits, it made reviewing this PR a lot easier. I left two suggestions for you, but otherwise LGTM.

@squeed
Copy link
Contributor

squeed commented Mar 4, 2024

As an additional feature (not necessary in this PR), I'd like to see toServices: work with "normal", k8s-managed services. It would just copy the service's selector in to the network policy selector. Does this code make such a change easy? I think so, but if not, we should rethink it.

@squeed
Copy link
Contributor

squeed commented Mar 4, 2024

This looks really nice. I'd like to make sure we don't accidentally open ourselves up for performance issues on service updates, since services are churny and policies are not. Other than that, this looks great.

@gandro gandro force-pushed the pr/gandro/k8s-policy-refactor-to-services branch 2 times, most recently from e665ac2 to dcc9b45 Compare March 6, 2024 10:37
@gandro
Copy link
Member Author

gandro commented Mar 6, 2024

I've reworked the PR quite a bit. Please re-review. In particular:

  • (probably the most "controversial"): We no longer deduplicate and sort the generated ToCIDRSet rules and leave it up to the policy engine to deal with this. This simplifies the logic quite a bit and shouldn't impact correctness. The old ToServices implementation also did sort the rules (but did deduplicate) and the existing CiliumCIDRGroup implementation also does not deduplicate. Therefore, neither sorting nor deduplicating should still yield correct results. But if people feel strongly about this, it's easily added back in.
  • The "which policies are affected by this service update?" logic has been reworked based on this comment here. We now keep track both of all policies with a ToServices rule (which is used if the update is for a new service or service which a label change), as well as a mapping from service ID to policy. The latter allows us to quickly extract which policies we want to update for a service endpoint update.
  • Addressed the minor feedback as suggested.

@gandro gandro force-pushed the pr/gandro/k8s-policy-refactor-to-services branch 2 times, most recently from 1afd749 to d157d59 Compare March 6, 2024 10:47
@gandro gandro requested review from squeed and learnitall March 6, 2024 10:47
@gandro
Copy link
Member Author

gandro commented Mar 6, 2024

/test

@gandro
Copy link
Member Author

gandro commented Mar 6, 2024

As an additional feature (not necessary in this PR), I'd like to see toServices: work with "normal", k8s-managed services. It would just copy the service's selector in to the network policy selector. Does this code make such a change easy? I think so, but if not, we should rethink it.

I would think so. Especially know with the unified processRule function which does the rewrite. I've also been careful to make sure that the current "which CNPs need to be updated" logic works if service selectors change. I might do a PoC next week based on this branch, I expect it to be fairly straightforward.

Copy link
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I have just have a couple more nits and comments for you this round.

gandro added 2 commits March 11, 2024 13:03
This commit removes the `updateCiliumNetworkPolicyV2` and uses the renamed
`addCiliumNetworkPolicyV2` (which is now called
`upsertCiliumNetworkPolicyV2`) in its place.

The removed `updateCiliumNetworkPolicyV2` did not provide any value:

- It validated the old version of the CNP object, even though that old
  version was always coming from the `PolicyWatcher.cnpCache`, to which
  CNPs are only added after they previously passed validation. So the
  `oldRuleCpy.Parse()` call could never fail in practise.
- It called `newRuleCpy.Parse()` followed by a call to
  `addCiliumNetworkPolicyV2` which also would then call `Parse()` again,
  thereby performing the validation twice.

The only meaningful logic was the "Modified CiliumNetworkPolicy" debug
log line, which this commit moves to the `onUpsert` call site. The log
line is also simplified, as there is no way where the name or namespace
of the CNP can change (as we're retrying the old object from the
`cnpCache` which is indexed by name and namespace).

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit moves all code which resolves CIDR group references into a
single `resolveCiliumNetworkPolicyRefs` function. This is done in
preparation for the upcoming sequence of commits, which will resolve
services references in the CNP. By having a single point where we
resolve references, we ensure that all different kinds of references
are updated together and consistently.

This commit does not contain any functional changes.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added 7 commits March 11, 2024 13:03
This introduces a new notification mechanism to the K8s service cache.
In contrast to the existing static single-consumer `Events` channel,
this new `Notifications` mechanism allows multiple dynamic subscribers
to be informed when the service cache changes.

Ideally, we would need two different mechanisms and replace the `Events`
consumer with a `Notifications` consumer. The difficulty comes
from the fact that the current `Events` channel is guaranteed to
deliver all events, including events which occurred before the consumer
started listening on the channel. Where as with the new dynamic
`Notifications` mechanisms, we have to deal with consumers that register
late. Guaranteeing delivery of early events in such a scenario much
harder to achieve, i.e. we would have to replay past events based on the
current cache state.

Since replaying past events would significantly complicate the
implementation, this commit opts for a simpler solution and instead
requires subscribers to register early, i.e. pushing the complexity to
consumers of the new `Notifications` mechanisms. The static consumers of
the existing `Events` mechanism are however unaffected.

There is an upcoming CFP to re-implement the ServiceCache on top of
StateDB. StateDB does have a "watch" mechanism which will give late
subscribers a full review of the cache, thus making custom event
mechanisms obsolete. Until then, the approach implemented by this commit
should serve as a stop-gap solution.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit introduces a new helper function to the service cache which
allows users outside of the package to search for services based on
arbitrary conditions (such as e.g. matching labels). The helper will be
used in a subsequent commit.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit relaxes the CNP/CCNP policy validation to allow `ToCIDRSet`
rules to coexist with other L3 members, such as `ToServices`. This has
no impact on user-submitted policies, since it is not possible for
policies imported via K8s/API to set the `Generated` field. The
`Generated` field is currently only set by the `ToServices`
implementation, which extracts service endpoints and translates them as
`ToCIDRSet` rules. Currently, this is done after CNP validation, however
a subsequent commit will perform that translation step earlier.

For this reason, we need to relax the rule validation to allow policy
rules to contain both a `ToServices` and generated `ToCIDRSet` rules.

This change has no user-facing impact.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit changes the `CiliumCIDRGroup` translation logic to not
create a copy of the CNP/CCNP before it is translated. Instead, the
passed in parameter is mutated in place.

While it would arguably be nice to have a functional approach where we
have a input parameter and one output result, creating two copies for
every CNP change is expensive. In particular, since an upcoming change
will add additional translation steps and we want to avoid having to
create unnecessary copies for every translation step.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit moves the logic to generate an IPCache resource ID into a
separate function. It will be used by additional call sites in a
subsequent commit.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit moves the logic which translates `ToServices` rules to
`ToCIDRSet` rules into `pkg/policy/k8s`. In doing so, we also change how
translation is performed:

Previously, translation was split over multiple packages: Initial policy
ingestion (in `pkg/policy`) invoked `k8s.PreprocessRules`, which
populated the initial version of the policy with `ToCIDRSet` rules.
Service updates were handled incrementally by the service watcher, which
invoked a `k8s.RuleTranslator` for each update and allocating CIDR
identities (via IPCache) accordingly.

The new approach centralizes all the logic into `pkg/policy/k8s`. We
mirror the approach from the existing `CiliumCIDRGroup` logic which also
translates references to `CiliumCIDRGroups` to `ToSericeSet`s. Because
the new approach modifies the CNP/CCNP before it is added to the policy
repository, we no longer perform incremental updates and no longer need
manage CIDR identities ourselves. This simplifies the code and allows us
to remove one of the last users of the "synchronous IPCache API".

As a side-effect of this change, `ToServices` rules for policies
ingested via the non-persistent API (i.e. `cilium-dbg policy import`)
are no longer supported. We assume that this was not a commonly used
feature for two reasons: First, importing non-K8s policies that refer
K8s resources (i.e. services) is likely rather rare. And secondly, the
previously implementation was slightly broken: Because policies imported
via API were not preprocessed, the policy was defunct until one of the
matched services generated an update event.

Note that this commit only removes the conflicting parts the old
infrastructure to keep this commit reviewable. The remaining code of the
old infrastructure is removed in a subsequent commit.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
With the previous commit, this logic is no longer needed.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro force-pushed the pr/gandro/k8s-policy-refactor-to-services branch from d157d59 to 0e41366 Compare March 11, 2024 13:19
@gandro
Copy link
Member Author

gandro commented Mar 11, 2024

Addressed all the feedback and extended the unit test a bit.

@gandro gandro requested a review from learnitall March 11, 2024 13:21
@gandro
Copy link
Member Author

gandro commented Mar 11, 2024

/test

@gandro
Copy link
Member Author

gandro commented Mar 11, 2024

Once this is merged, I can open a PR removing some more code: gandro/cilium@pr/gandro/k8s-policy-refactor-to-services...gandro:cilium:pr/gandro/ipcache-remove-sync-alloc-cidr

Copy link
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 11, 2024
@gandro gandro requested a review from squeed March 12, 2024 09:38
@gandro gandro removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 12, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 12, 2024
@gandro gandro added this pull request to the merge queue Mar 12, 2024
Merged via the queue into cilium:main with commit a5824bd Mar 12, 2024
@gandro gandro deleted the pr/gandro/k8s-policy-refactor-to-services branch March 12, 2024 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants