-
Notifications
You must be signed in to change notification settings - Fork 3.4k
policy/k8s: Refactor and move ToServices
translation to policy package
#31062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
policy/k8s: Refactor and move ToServices
translation to policy package
#31062
Conversation
21cb64f
to
bff4c69
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing! Thank you so much for all of the detail in your commits, it made reviewing this PR a lot easier. I left two suggestions for you, but otherwise LGTM.
As an additional feature (not necessary in this PR), I'd like to see |
This looks really nice. I'd like to make sure we don't accidentally open ourselves up for performance issues on service updates, since services are churny and policies are not. Other than that, this looks great. |
e665ac2
to
dcc9b45
Compare
I've reworked the PR quite a bit. Please re-review. In particular:
|
1afd749
to
d157d59
Compare
/test |
I would think so. Especially know with the unified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I have just have a couple more nits and comments for you this round.
This commit removes the `updateCiliumNetworkPolicyV2` and uses the renamed `addCiliumNetworkPolicyV2` (which is now called `upsertCiliumNetworkPolicyV2`) in its place. The removed `updateCiliumNetworkPolicyV2` did not provide any value: - It validated the old version of the CNP object, even though that old version was always coming from the `PolicyWatcher.cnpCache`, to which CNPs are only added after they previously passed validation. So the `oldRuleCpy.Parse()` call could never fail in practise. - It called `newRuleCpy.Parse()` followed by a call to `addCiliumNetworkPolicyV2` which also would then call `Parse()` again, thereby performing the validation twice. The only meaningful logic was the "Modified CiliumNetworkPolicy" debug log line, which this commit moves to the `onUpsert` call site. The log line is also simplified, as there is no way where the name or namespace of the CNP can change (as we're retrying the old object from the `cnpCache` which is indexed by name and namespace). Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit moves all code which resolves CIDR group references into a single `resolveCiliumNetworkPolicyRefs` function. This is done in preparation for the upcoming sequence of commits, which will resolve services references in the CNP. By having a single point where we resolve references, we ensure that all different kinds of references are updated together and consistently. This commit does not contain any functional changes. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This introduces a new notification mechanism to the K8s service cache. In contrast to the existing static single-consumer `Events` channel, this new `Notifications` mechanism allows multiple dynamic subscribers to be informed when the service cache changes. Ideally, we would need two different mechanisms and replace the `Events` consumer with a `Notifications` consumer. The difficulty comes from the fact that the current `Events` channel is guaranteed to deliver all events, including events which occurred before the consumer started listening on the channel. Where as with the new dynamic `Notifications` mechanisms, we have to deal with consumers that register late. Guaranteeing delivery of early events in such a scenario much harder to achieve, i.e. we would have to replay past events based on the current cache state. Since replaying past events would significantly complicate the implementation, this commit opts for a simpler solution and instead requires subscribers to register early, i.e. pushing the complexity to consumers of the new `Notifications` mechanisms. The static consumers of the existing `Events` mechanism are however unaffected. There is an upcoming CFP to re-implement the ServiceCache on top of StateDB. StateDB does have a "watch" mechanism which will give late subscribers a full review of the cache, thus making custom event mechanisms obsolete. Until then, the approach implemented by this commit should serve as a stop-gap solution. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit introduces a new helper function to the service cache which allows users outside of the package to search for services based on arbitrary conditions (such as e.g. matching labels). The helper will be used in a subsequent commit. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit relaxes the CNP/CCNP policy validation to allow `ToCIDRSet` rules to coexist with other L3 members, such as `ToServices`. This has no impact on user-submitted policies, since it is not possible for policies imported via K8s/API to set the `Generated` field. The `Generated` field is currently only set by the `ToServices` implementation, which extracts service endpoints and translates them as `ToCIDRSet` rules. Currently, this is done after CNP validation, however a subsequent commit will perform that translation step earlier. For this reason, we need to relax the rule validation to allow policy rules to contain both a `ToServices` and generated `ToCIDRSet` rules. This change has no user-facing impact. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit changes the `CiliumCIDRGroup` translation logic to not create a copy of the CNP/CCNP before it is translated. Instead, the passed in parameter is mutated in place. While it would arguably be nice to have a functional approach where we have a input parameter and one output result, creating two copies for every CNP change is expensive. In particular, since an upcoming change will add additional translation steps and we want to avoid having to create unnecessary copies for every translation step. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit moves the logic to generate an IPCache resource ID into a separate function. It will be used by additional call sites in a subsequent commit. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit moves the logic which translates `ToServices` rules to `ToCIDRSet` rules into `pkg/policy/k8s`. In doing so, we also change how translation is performed: Previously, translation was split over multiple packages: Initial policy ingestion (in `pkg/policy`) invoked `k8s.PreprocessRules`, which populated the initial version of the policy with `ToCIDRSet` rules. Service updates were handled incrementally by the service watcher, which invoked a `k8s.RuleTranslator` for each update and allocating CIDR identities (via IPCache) accordingly. The new approach centralizes all the logic into `pkg/policy/k8s`. We mirror the approach from the existing `CiliumCIDRGroup` logic which also translates references to `CiliumCIDRGroups` to `ToSericeSet`s. Because the new approach modifies the CNP/CCNP before it is added to the policy repository, we no longer perform incremental updates and no longer need manage CIDR identities ourselves. This simplifies the code and allows us to remove one of the last users of the "synchronous IPCache API". As a side-effect of this change, `ToServices` rules for policies ingested via the non-persistent API (i.e. `cilium-dbg policy import`) are no longer supported. We assume that this was not a commonly used feature for two reasons: First, importing non-K8s policies that refer K8s resources (i.e. services) is likely rather rare. And secondly, the previously implementation was slightly broken: Because policies imported via API were not preprocessed, the policy was defunct until one of the matched services generated an update event. Note that this commit only removes the conflicting parts the old infrastructure to keep this commit reviewable. The remaining code of the old infrastructure is removed in a subsequent commit. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
With the previous commit, this logic is no longer needed. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
d157d59
to
0e41366
Compare
Addressed all the feedback and extended the unit test a bit. |
/test |
Once this is merged, I can open a PR removing some more code: gandro/cilium@pr/gandro/k8s-policy-refactor-to-services...gandro:cilium:pr/gandro/ipcache-remove-sync-alloc-cidr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
This PR moves the logic which translates
ToServices
rules toToCIDRSet
rules intopkg/policy/k8s
. In doing so, we also refactor how translation is performed:Previously, translation was split over multiple packages: Initial policy ingestion (in
pkg/policy
) invokedk8s.PreprocessRules
, whichpopulated the initial version of the policy withToCIDRSet
rules. Service updates were handled incrementally by the service watcher, which invoked ak8s.RuleTranslator
for each update and allocating CIDRidentities (via IPCache) accordingly.
The new approach centralizes all the logic into
pkg/policy/k8s
. We mirror the approach from the existingCiliumCIDRGroup
logic which also translates references toCiliumCIDRGroups
toToCIDRSet
s. Becausethe new approach modifies the CNP/CCNP before it is added to the policy repository, we no longer perform incremental updates and no longer need manage CIDR identities ourselves. This simplifies the code and allows us to remove one of the last users of the "synchronous IPCache API".
Note: This PR removes the last users of the synchronous CIDR allocation API. I will remove those functions (
AllocateCIDRs
/ReleaseCIDRIdentitiesByCIDR
) in a separate PR since this PR is already on the larger side.As a side-effect of this change,
ToServices
rules for policies ingested via the non-persistent API (i.e.cilium-dbg policy import
) are no longer supported. We assume that this was not a commonly used feature for two reasons: First, importing non-K8s policies that refer K8s resources (i.e. services) is likely rather rare. And secondly, the previously implementation was slightly broken: Because policies imported via API were not preprocessed, the policy was defunct until one of thematched services generated an update event.
Review per commit.
Fixes: #30322