-
Notifications
You must be signed in to change notification settings - Fork 3.4k
bgpv2: Add a knob to disable CRD status reporting #35976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b4a17f1
to
f732c49
Compare
Status reporting (status.conditions in the ClusterConfig and PeerConfig or status in the NodeConfig) are useful for troubleshooting. Thus, it is recommended to enable it. However, if there's an issue like high API server load, we may want to disable it. This commit introduces a Helm value to control it. The default is enabled. We'll introduce an actual agent/operator options in the subsequent commits. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Add a new agent/operator option enable-bgp-control-plane-status-report that enables/disables status reporting (status.conditions on the CiliumBGPClusterConfig and CiliumBGPPeerConfig, status on the CiliumBGPNodeConfig). It is enabled by default, but users can disable it to reduce the API server load. So far, we only provide a knob to enable/disable all status reporting. In the future, we may want to provide more fine-grained knob such as enabling/disabling status reporting per resource. However, as a starting point, we'll provide it with the simplest way. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
When users disable the status reporting on the running cluster, the previously reported status won't be updated anymore. That means the stale status report persists. To avoid this, cleanup the status fields when the status reporting is disabled. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
f732c49
to
ece2183
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-wise, this looks good to me.
We used to have a similar feature for CNPs, that we eventually turned off by default and then removed completely because it suffered from the same scalability issues.
I do wonder if we should also disable this feature by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. Turns out the scalability issues are less pronounced with the BGP status compared to the old CNP status I mentioned. So keeping this enabled by default seems like a reasonable tradeoff.
In the CNP case, we used to edit global CNP from multiple nodes while they are watched by all nodes. On the other hand, the BGP resources only have a single writer.
We recently introduced several CRD status reporting features (#33411, #35517, #35527, #35650). They are useful for troubleshooting, but there's a trade-off between API server load. For the large cluster with many nodes or many BGP resources may experience a high API server load. In this PR, we introduce a knob to disable BGPv2 CRD status reporting for such an environment.