Skip to content

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Aug 30, 2023

Currently, the clustermesh package already leverages a ClusterIDName struct to wrap the ClusterID and ClusterName information. Let's use it also to register the associated flags, so that we can directly depend on it on the different components, and reuse it for the kvstoremesh and clustermesh-apiserver binaries uniformly.

To avoid the need for modifying all usages, we preserve the current fields inside DaemonConfig, which continue to be populated as before.

@giorio94 giorio94 added kind/cleanup This includes no functional changes. area/clustermesh Relates to multi-cluster routing functionality in Cilium. release-note/misc This PR makes changes that have no direct user impact. labels Aug 30, 2023
@giorio94
Copy link
Member Author

/test

@giorio94 giorio94 force-pushed the mio/clusteridname-flags branch from 0bd7bc2 to 438811b Compare August 30, 2023 12:39
@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

Rebased onto main to pick CI changes

@giorio94 giorio94 force-pushed the mio/clusteridname-flags branch from 438811b to dac2456 Compare August 30, 2023 13:19
@giorio94
Copy link
Member Author

/test

@giorio94 giorio94 marked this pull request as ready for review August 30, 2023 13:20
@giorio94 giorio94 requested review from a team as code owners August 30, 2023 13:20
@giorio94 giorio94 force-pushed the mio/clusteridname-flags branch from dac2456 to 1101132 Compare August 30, 2023 14:28
@giorio94
Copy link
Member Author

/test

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-wise looks good to me.

I'd be interested to know more about the motivation behind this change, and what the plan is going forward. The cluster name is something that is needed by many subsystems to namespace global resources: Node names, EC2 network interface tags in ENI IPAM, service labels, identity labels.

Is the goal that all those subsystems (node discovery, IPAM, service, identity allocation, IPCache) eventually rely on the clustermesh cell? Will that not cause potentially circular cell dependencies?

While I agree we should reduce global configuration state, the cluster name is something that is inherently global and relevant to a lot of subsystems. If we want to make the clustername and cluster id a dependency to be injected into other subsystems, it should probably be a leaf dependency that depends on nothing else, so I'm not sure if the clustermesh cell is the right place (since it itself has lots of dependencies already). Did you consider introducing a separate cluster-id cell instead?

@giorio94
Copy link
Member Author

Code-wise looks good to me.

I'd be interested to know more about the motivation behind this change, and what the plan is going forward. The cluster name is something that is needed by many subsystems to namespace global resources: Node names, EC2 network interface tags in ENI IPAM, service labels, identity labels.

Essentially this came up while refactoring the clustermesh-apiserver and kvstoremesh in preparation to merging them into a single binary (as part of a larger effort to unify all clustermesh-apiserver components into a single image for simplicity: #23770).

The idea is to extract these two flags into a separate object injected through Hive, so that we can reuse them uniformly in all the different components, as well as remove the hacky conversions from the global DaemonConfig, which were for instance used before in the clustermesh cell. Over time, it would be great if also the other subsystems requiring these parameter could depend directly on the ClusterIDName struct, rather than reading the global DaemonConfig.

Is the goal that all those subsystems (node discovery, IPAM, service, identity allocation, IPCache) eventually rely on the clustermesh cell? Will that not cause potentially circular cell dependencies?

While I agree we should reduce global configuration state, the cluster name is something that is inherently global and relevant to a lot of subsystems. If we want to make the clustername and cluster id a dependency to be injected into other subsystems, it should probably be a leaf dependency that depends on nothing else, so I'm not sure if the clustermesh cell is the right place (since it itself has lots of dependencies already). Did you consider introducing a separate cluster-id cell instead?

The different subsystems can depend on the ClusterIDName struct alone, without needing to depend on the whole ClusterMesh object (which would pose circular dependency issues as you mentioned). For the same reason, the struct is defined in a separate subpackage, and registered at the top of the cell hierarchy, not as part of the clustermesh cell (which only depends on it).

@gandro
Copy link
Member

gandro commented Aug 30, 2023

For the same reason, the struct is defined in a separate subpackage, and registered at the top of the cell hierarchy, not as part of the clustermesh cell (which only depends on it).

Awesome, thanks for clarifying. I'm still not super familiar with the hive framwork, so I didn't know one could depend on struct alone this way. Sounds good to me!

Copy link
Contributor

@derailed derailed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giorio94 Very nice work Marco and nice clean up! I think there might be an opportunity to further improve, but I am a bit out of context here, so feel free to disregard..

@@ -3036,8 +3030,8 @@ func (c *DaemonConfig) Populate(vp *viper.Viper) {
c.AutoCreateCiliumNodeResource = vp.GetBool(AutoCreateCiliumNodeResource)
c.BPFRoot = vp.GetString(BPFRoot)
c.CGroupRoot = vp.GetString(CGroupRoot)
c.ClusterID = vp.GetUint32(ClusterIDName)
c.ClusterName = vp.GetString(ClusterName)
c.ClusterID = vp.GetUint32(clustermeshTypes.OptClusterID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the new type here ie c.ClusterIDName = NewClusterIDName(...)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky, because the only way to keep the current names of the fields (to avoid codebase-wide changes) would be to embed the ClusterInfo struct into DaemonConfig, which feels weird as it would also inherit the associated methods. Although suboptimal, I'd keep the current approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Marco! I see what you are saying. Figured it might be an opportunity to begin cleaning this insanely large struct in favor of segmenting based on config domain. Using named association would scope the delegation vs inheriting the behaviors... But in this case clusterInfo validation when it comes to configuration would be a good thing anyway. But we would need to establish pattens across the board which is a big undertaking at this stage ;(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely second the idea of removing configurations from the DaemonConfig struct. I took another pass at removing a few more usages of the ClusterID/ClusterName fields when possible (mainly the ones which would also be useful in the context of #27520, but I wouldn't go further as not to excessively increase the scope of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about instead of pulling them from viper like this, we would instead fill them into DaemonConfig from an Invoke that is part of the config cell (see my earlier comment)? Or alternatively leave it as-is, if these are going away soon anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue of pulling it from an invoke hook is that we would postpone a bit the initialization, which would be an issue if any (legacy) component not directly depending on it would try to read these values. The current solution seems a bit less error prone to me from that point of view.

@giorio94 giorio94 force-pushed the mio/clusteridname-flags branch from 1101132 to be8e582 Compare August 31, 2023 08:26
@giorio94 giorio94 force-pushed the mio/clusteridname-flags branch from be8e582 to 0efc7e0 Compare August 31, 2023 08:40
@giorio94
Copy link
Member Author

/test

@giorio94 giorio94 marked this pull request as draft August 31, 2023 11:14
@giorio94
Copy link
Member Author

giorio94 commented Sep 1, 2023

/test

@giorio94 giorio94 force-pushed the mio/clusteridname-flags branch from 6be4f09 to ee2d117 Compare September 1, 2023 13:19
@giorio94 giorio94 requested review from a team as code owners September 1, 2023 13:19
@giorio94 giorio94 requested a review from brlbil September 1, 2023 13:19
@giorio94
Copy link
Member Author

giorio94 commented Sep 1, 2023

/test

@giorio94 giorio94 force-pushed the mio/clusteridname-flags branch from ee2d117 to 0895dae Compare September 1, 2023 13:44
@giorio94 giorio94 removed request for a team and brlbil September 1, 2023 13:45
@giorio94
Copy link
Member Author

giorio94 commented Sep 1, 2023

Removed CI-related reviewers because that commit has been dropped.

@giorio94
Copy link
Member Author

giorio94 commented Sep 1, 2023

/test

@giorio94 giorio94 force-pushed the mio/clusteridname-flags branch from 0895dae to 0a06061 Compare September 1, 2023 14:13
@giorio94
Copy link
Member Author

giorio94 commented Sep 1, 2023

I've reverted the changes affecting external workloads, as the scope of this PR was increasing too much. Let's see if now it passes all tests.

@giorio94
Copy link
Member Author

giorio94 commented Sep 1, 2023

/test

Currently, the clustermesh package already leverages a ClusterIDName
struct to wrap the ClusterID and ClusterName information. Let's use it
also to register the associated flags, so that we can directly depend
on it on the different components, and reuse it for the kvstoremesh
and clustermesh-apiserver binaries uniformly.

To avoid the need for modifying all usages, we preserve the current
fields inside DaemonConfig, which continue to be populated as before.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the mio/clusteridname-flags branch from 0a06061 to b325485 Compare September 1, 2023 14:18
@giorio94
Copy link
Member Author

giorio94 commented Sep 1, 2023

/test

@giorio94
Copy link
Member Author

giorio94 commented Sep 1, 2023

@derailed PTAL

@@ -141,7 +140,6 @@ func NewCachingIdentityAllocator(owner cache.IdentityAllocatorOwner) fakeIdentit
func (s *EndpointSuite) SetUpTest(c *C) {
/* Required to test endpoint CEP policy model */
kvstore.SetupDummy(c, "etcd")
identity.InitWellKnownIdentities(&fakeConfig.Config{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we not need these here anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WellKnownIdentities are used only in very specific cases, mainly to bootstrap Cilium with etcd running in pod network (to break the chicken egg dependency). Given that they were not necessary here (as well as in the clustermesh case), I just removed the line altogether instead of adapting the parameters.

@@ -100,6 +101,9 @@ var (
"operator-controlplane",
"Operator Control Plane",

cell.Config(cmtypes.DefaultClusterInfo),
cell.Invoke(func(cinfo cmtypes.ClusterInfo) error { return cinfo.Validate() }),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need to wrap it, but rather you can just do: cell.Invoke(cmtypes.ClusterInfo.Validate). Debatable whether this is less or more confusing, but it is shorter.. :D

Why do we need Validate here, but not in daemon/cmd/cells.go? If it's missing from there, how about doing cmtypes.ConfigCell that's cell.Group(cell.Config(DefaultClusterInfo), cell.Invoke(ClusterInfo.Validate))? Preferably this config would be part of some module and not referenced naked here like this, but perhaps that's to come in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need to wrap it, but rather you can just do: cell.Invoke(cmtypes.ClusterInfo.Validate). Debatable whether this is less or more confusing, but it is shorter.. :D

Uh, that's right. I didn't realize it.

Why do we need Validate here, but not in daemon/cmd/cells.go? If it's missing from there, how about doing cmtypes.ConfigCell that's cell.Group(cell.Config(DefaultClusterInfo), cell.Invoke(ClusterInfo.Validate))? Preferably this config would be part of some module and not referenced naked here like this, but perhaps that's to come in the future?

So, the thing is that depending on the component we need to perform a slightly different validation, mainly because for kvstoremesh we don't allow the cluster ID to be zero (which also means that the check needs to be performed from a start hook, as it is the default value). The validation in the agent is currently still part of Config.Validate mainly because that is performed earlier to prevent possible issues for components which do not directly depend on the ClusterInfo struct.

@@ -3036,8 +3030,8 @@ func (c *DaemonConfig) Populate(vp *viper.Viper) {
c.AutoCreateCiliumNodeResource = vp.GetBool(AutoCreateCiliumNodeResource)
c.BPFRoot = vp.GetString(BPFRoot)
c.CGroupRoot = vp.GetString(CGroupRoot)
c.ClusterID = vp.GetUint32(ClusterIDName)
c.ClusterName = vp.GetString(ClusterName)
c.ClusterID = vp.GetUint32(clustermeshTypes.OptClusterID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about instead of pulling them from viper like this, we would instead fill them into DaemonConfig from an Invoke that is part of the config cell (see my earlier comment)? Or alternatively leave it as-is, if these are going away soon anyway.

@thorn3r thorn3r requested a review from derailed September 5, 2023 15:05
Copy link
Contributor

@derailed derailed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giorio94 LGTM

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 6, 2023
@youngnick youngnick merged commit eeb4f68 into cilium:main Sep 6, 2023
@giorio94 giorio94 deleted the mio/clusteridname-flags branch September 19, 2023 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. kind/cleanup This includes no functional changes. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants