-
Notifications
You must be signed in to change notification settings - Fork 3.4k
clustermesh: Consolidate hive #37620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6f022a2
to
ae64045
Compare
/test |
Perhaps I'm missing some context about this change, but it is intentional that the clustermesh/kvstoremesh hives are separate, as each is run by its own individual sub-command (in turn executed in two separate containers, if both are enabled at the same time). I don't think there's the risk that any option is not responsive, because they are registered to the corresponding sub-command, as well as only one instance of gops/pprof and alike is guaranteed to be executed at any time depending on which command is executed. It could be theoretically possible to consolidate everything into a single command, but I'm not sure that would have a lot of benefits, at the cost of worse separation of concerns (e.g., only one needs to access the k8s api-server, etcd clients are separate so that rate limiting of one doesn't impact the other and so on). It would also require a quite significant refactoring to allow enabling/disabling the different components independently, while that currently achieved via the separate sub-commands. |
OK thanks a lot for that context. I haven't fully formed my thoughts on what this should look like, but here's the background I'm coming from:
I think that largely leads us to this PR. Looking forward:
More than anything, this gave me an excuse to poke around in the top level of this command to try to understand the functionality better, so that goal of mine is also achieved. Another alternative option going forward is just for me to abandon this PR. |
Thanks a lot for the context, makes sense! And obviously feel free to ask if you have questions about this binary or clustermesh in general.
The commands to inspect the hive are already available, although as part of the individual sub-commands (not at the root level):
Essentially, the root of the clustermesh-apiserver is just a container for the other commands, and it does nothing by itself (besides printing the help message).
Yes, the clustermesh-apiserver is a bit special in this respect as it bundles multiple independent functionalities as separate sub-commands, essentially to avoid requiring separate containers/binaries that have already proved problematic in the past. The hive configuration is still static as for e.g., the Cilium agent, but there are actually two, one for each core command (the other commands are instead more trivial and implemented the old way).
Yep, these could be grouped together in a dedicated module and imported by both. It wouldn't change anything from the functional point of view, but I agree it would remove a bit of code duplication. |
OK, I'll refocus this PR on just creating the common Cell in that case - I think we'll get most of the cleanup benefits from that alone. I'm still not 100% confident about my statement "[there's no] problem with overlapping flags in this binary." given the way Hive is used in these binaries, but I have an idea about how to prove that out. |
ae64045
to
69ed03f
Compare
/ci-clustermesh |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking care of this. A couple of comments inline.
Pull request was converted to draft
69ed03f
to
5759727
Compare
/test |
There are several components which rely on the pprof Cell, with different configuration parameters for each. In order to improve code reuse, require the pprof configuration as a parameter for the Cell and expect the caller to define the default configuration for those flags. Signed-off-by: Joe Stringer <joe@cilium.io>
This function has been unused since the introduction of the CRD sync promise. Remove it as it is no longer needed. Fixes: d8102b9 ("k8s/synced: Add CRDSync promise") Signed-off-by: Joe Stringer <joe@cilium.io>
This flag was previously declared in two places for the agent and clustermesh-apiserver to define the configuration, however it's used only from this synced package. Move the flag configuration into this package to consolidate the declaration into one place for better code reuse. The agent hive command reference here presumably changes because the flag was migrated from the traditional global flags reference over into the Hive, so the Hive infrastructure is now aware of the flag. Signed-off-by: Joe Stringer <joe@cilium.io>
These "legacy" configurations are common across both clustermesh/kvstoremesh binaries, so we can reuse the same flag declarations for both. While we're at it, explicitly define the default configuration for the log options. Signed-off-by: Joe Stringer <joe@cilium.io>
An upcoming change will move this into a common package. In preparation, move this out to the top level so that the diff in the subsequent commit is clearer to follow. No logical changes here. Signed-off-by: Joe Stringer <joe@cilium.io>
Several of the core Cells that are reused by clustermesh and kvstoremesh are declared separately. In order to simplify the logic, this commit moves those common Cells into a single declaration at the top level and directly passes that common Cell into each Hive when it is created on startup. The gops,pprof cells are kept separately declared in order to retain the different port configuration to avoid conflicts when running on the same IP. The Health API server is centralized, but the health endpoints APIs are kept separate in order to allow them to diverge in future (though they are currently identical). Signed-off-by: Joe Stringer <joe@cilium.io>
5759727
to
15eb083
Compare
/ci-clustermesh |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a nice change. Can't find anything wrong with this
Since merging clustermesh and kvstoremesh together, the code was still
quite separate, registering multiple Hives and duplicate Cells (+
duplicate configuration). This could cause some options to be not
responsive to user configuration, as well as just unnecessarily running
extra instances of core logic that is not necessary to run - for
instance, two pprof or two gops servers. This may also fix issues where
some clustermesh controllers do not appear in debug status output.
Consolidate these into a single Hive with all the common cells, and
expose a new 'clustermesh-apiserver hive' command to inspect the
hierarchy.