-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Use custom container to initialise clustermesh etcd #26352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Replace the existing shell script that is used to initialise etcd with a custom container running a Go binary. This binary sets up an embedded etcd server to configure the settings. Fixes cilium#23770 Signed-off-by: James Laverack <james@isovalent.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super exciting work! I took a quick look and so far your implementation looks great. I have a couple of follow-up questions for you though:
- Do we need pprof, gops or completion? Not a deal-breaker, but It does add some overhead. I'm not familiar with our other images and if this is standard.
- Also not a deal breaker, but it might be a good idea to incorporate the hive framework here, as we might migrate this code over to it in the future anyways. This would also be helpful for potential testing on clustermesh in the future, as it would fit into the testing methodology we are working towards. For example, one could import the etcd-init Cell and use it to init a new etcd database within another integration test.
- Any ideas or plans on how we want to write tests? I think starting with some kind of integration test where we compare an etcd database that has been initialized with this code to an etcd database that has been initialized using the shell script it will replace. We could even write a test that performs an init with this code and just checks the state of the database to make sure everything we expect is there.
@@ -0,0 +1,121 @@ | |||
package init |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of variables in here that I think could be moved to the defaults.go
file. See [1]
pkg/etcd/init/init.go
Outdated
) | ||
|
||
func InitEtcd(ctx context.Context, client *clientv3.Client, clusterName string) error { | ||
_, err := client.Put(ctx, "cilium/.has-cluster-config", "true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could make sense to add a constant for this key value. Also, I'm assuming that the goal is to eventually use this to stop init if this key is found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm assuming that the goal is to eventually use this to stop init if this key is found?
There's no specific future goal, this is just matching existing behaviour which sets this key.
It is used in the kvstore code though.
Speaking of, it's probably better to read this key from there.
Thank you so much for the review!
Hrm, possibly not. I copied another binary to form this one, and in general I've tried to keep to existing style wherever possible. I'm indifferent on if we keep them or remove them.
I must admit I'm not very familiar with hive. I'll do some research and see what can be done.
Yeah, there's a few levels of test we can write
|
I haven't yet had an in-depth look at the code. One aspect that worries me a bit, though, is that this binary will use a fixed version of etcd (the one specified in the go.mod file), which will be likely different from the one used for the real etcd container configured by the user, possibly leading to subtle bugs in case of incompatibilities. Moreover, we should also make multiple versions of this binary available (for different etcd versions), which is also problematic. WDYT? |
It's a good point. Do we have a documented set of etcd versions that we support? Building multiple versions is of course possible. I'm not certian if we can import multiple different etcd versions in the same Go source tree — but I think it's possible. Failing that some makefile magic might be required. I did consider not useing embedded etcd, and building the init binary as a client-only thing and layering it ontop of etcd. But we'd still need to build, test, and distribute multiple versions for different etcd versions so that doesn't really help. In a way this problem already exists — we're just presuming the etcdctl API wouldn't change in the existing shell script. |
I don't think it is explicitly documented somewhere, but maybe I just don't know about that. Client side we are checking that the etcd server version is >=3.1.0 though.
Yeah, I also haven't found yet any good solution which does not require having to build multiple versions (unless etcd supported configuration files). Yet, the etcd docs mention that minor version downgrade is not supported, and minor version upgrades shall only performed one version at a time. Both of which are potential issues in case of mismatching etcd versions between the custom init container and the real etcd one. |
This pull request has been automatically marked as stale because it |
I'm still working on this, just been on PTO recently. 😅 |
We should have control of that though? We set the etcd image version in our install YAML right? |
Kinda. We configure those as part of the helm values, which means that users can modify them if they wish. |
This pull request has been automatically marked as stale because it |
This pull request has not seen any activity since it was marked stale. |
/reopen |
This is an early draft of a fix for #23770
Please ensure your pull request adheres to the following guidelines:
description and a
Fixes: #XXX
line if the commit addresses a particularGitHub issue.
Fixes: <commit-id>
tag, thenplease add the commit author[s] as reviewer[s] to this issue.
Fixes: #23770