-
Notifications
You must be signed in to change notification settings - Fork 3.4k
policy: consistent enablement in agent and operator #36167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Could you move the detailed explanation for safe upgrade into the docs? The release notes in the PR description are expected to be formatted as a single-line notification for users that they can then click to learn more from the PR. Consider that the text from the release note in the PR description ends up as just one bulletpoint in a list like the one here: https://github.com/cilium/cilium/releases/tag/v1.17.0-pre.2 One of the reasons I suggest this is that the tooling doesn't properly format multiline release notes so it'll likely just render a bit broken unless the release manager catches & fixes it. The other reason is that the release notes are generally a bit less accessible than having a dedicated writeup in the main docs pages. EDIT: Actually on second thought I'm not sure we want to be documenting this for users. There are too many footguns with this configuration and not enough tests, see also my comments here. If someone is a developer and follow a PR like this to figure out how to make this configuration work for them, then great. What I don't want is for mainstream Cilium users to see the release note, think this is some great optimization, enable it and then break a bunch of common functionality and come to Slack asking why everything is broken. Maybe let's avoid documenting this for users via release note or docs, and just leave that content as regular text in the PR description? |
Are these settings apply only for CRD based mode ? Can users enable network policy with etcd as backend and still keep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few nits
Response to comment (#36167 (comment))
Yes, it would be safer to do so. I updated the PR. We will not advertise this specific configuration to all the users, but rather those who come with the interest of scaling Cilium for other features, beyond the scale that network policy supports, can learn about it. |
Response to comment (#36167 (comment))
Currently, it would work only with CRD based mode, because the path is simpler for it, since IPCache is populated via the CRDs that are disabled based on existing configuration. If there is interest in expanding this to etcd backend, we need to ensure that we have configuration that will in a similar way set up the system to get the same benefits (stop syncing IPCache for all pods to all nodes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Left a comment about operator options binding.
The network policy enforcement system for K8s, Cilium and Cilium Clusterwide network policies can be disabled when not used, to reduce resource footprint of Cilium, and improve scalability and performance. Conditions for policy disabled: - EnablePolicy=never - DisableCiliumEndpointCRD=true - EnableK8sNetworkPolicy=false - EnableCiliumNetworkPolicy=false - EnableCiliumClusterwideNetworkPolicy=false - IdentityAllocationMode=crd **Important note** When enabling, or re-enabling, network policies (going from disabled to enabled configuration), to avoid any traffic disruption, ensure to follow one of these two-step processes: - First enable network policy configuration and update / restart the cilium-agent pods, before creating any network policies in the cluster. - If network policies are already created in the cluster, avoid enabling all the configuration at once. First change DisableCiliumEndpointCRD to false and update / restart cilium-agent pods, before applying the rest of the configuration. Do not change EnablePolicy and EnableK8sNetworkPolicy, EnableCiliumNetworkPolicy or EnableCiliumNetworkPolicy before first changing DisableCiliumEndpointCRD to false and updating / restarting cilium-agent pods. This is especially important for larger clusters that can take longer time to update cilium-agent daemonset and reconcile the system. The reason is that the network policy enforcement system on cilium-agent requires Cilium Endpoint CRD to enforce policies for pods on remote nodes. Therefore, if network policies and Cilium Endpoint CRD are enabled at the same time, some traffic between the nodes can be disrupted (dropped), because rolling update / restart of cilium-agent takes time and remote node Cilium Endpoint might not yet been created when on the local node the policy that requires it is already being enforced. Signed-off-by: Dorde Lapcevic <dordel@google.com>
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code looks ok to me. I have no idea whether this is a direction the project wants to take, but I'll happily defer to a maintainer.
Just to chime in a bit here from one perspective: There is a developing set of use cases from some heavy users of Cilium who may choose to implement network policy in ways external to Cilium, so we're exploring how to enable those deployments to be more resource efficient. Right now I'm relying on folks like @dlapcevic and @tamilmani1989 to explore these options and help maintain Cilium in this configuration, and I consider it to be a power user configuration that we avoid explicitly documenting. I don't personally have a good picture of exactly how we will ensure that the core of Cilium continues to be maintainable with this mode, but if we don't try then we won't build that knowledge / expertise. However if folks are using this mode, I'm expecting them to be heavily involved in development of this mode to actively improve Cilium with these configuration parameters. This is a bit of a cautious approach but it seems like a reasonable middle ground for now. |
Ref: #33360
The network policy enforcement system for K8s, Cilium and Cilium Clusterwide network policies can be disabled when not used, to reduce resource footprint of Cilium, and improve scalability and performance.
Conditions for policy disabled:
Important note
When enabling, or re-enabling, network policies (going from disabled to enabled configuration), to avoid any traffic disruption, ensure to follow one of these two-step processes:
This is especially important for larger clusters that can take longer time to update cilium-agent daemonset and reconcile the system. The reason is that the network policy enforcement system on cilium-agent requires Cilium Endpoint CRD to enforce policies for pods on remote nodes. Therefore, if network policies and Cilium Endpoint CRD are enabled at the same time, some traffic between the nodes can be disrupted (dropped), because rolling update / restart of cilium-agent takes time and remote node Cilium Endpoint might not yet been created when on the local node the policy that requires it is already being enforced.
Signed-off-by: Dorde Lapcevic <dordel@google.com>