Skip to content

Conversation

JamesLaverack
Copy link
Member

This is an early draft of a fix for #23770


Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

Fixes: #23770

<!-- Enter the release note text here if needed or remove this section! -->

Replace the existing shell script that is used to initialise etcd with a
custom container running a Go binary. This binary sets up an embedded
etcd server to configure the settings.

Fixes cilium#23770

Signed-off-by: James Laverack <james@isovalent.com>
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 19, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Jun 19, 2023
Copy link
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super exciting work! I took a quick look and so far your implementation looks great. I have a couple of follow-up questions for you though:

  1. Do we need pprof, gops or completion? Not a deal-breaker, but It does add some overhead. I'm not familiar with our other images and if this is standard.
  2. Also not a deal breaker, but it might be a good idea to incorporate the hive framework here, as we might migrate this code over to it in the future anyways. This would also be helpful for potential testing on clustermesh in the future, as it would fit into the testing methodology we are working towards. For example, one could import the etcd-init Cell and use it to init a new etcd database within another integration test.
  3. Any ideas or plans on how we want to write tests? I think starting with some kind of integration test where we compare an etcd database that has been initialized with this code to an etcd database that has been initialized using the shell script it will replace. We could even write a test that performs an init with this code and just checks the state of the database to make sure everything we expect is there.

@@ -0,0 +1,121 @@
package init
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of variables in here that I think could be moved to the defaults.go file. See [1]

)

func InitEtcd(ctx context.Context, client *clientv3.Client, clusterName string) error {
_, err := client.Put(ctx, "cilium/.has-cluster-config", "true")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could make sense to add a constant for this key value. Also, I'm assuming that the goal is to eventually use this to stop init if this key is found?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm assuming that the goal is to eventually use this to stop init if this key is found?

There's no specific future goal, this is just matching existing behaviour which sets this key.

It is used in the kvstore code though.

Speaking of, it's probably better to read this key from there.

@JamesLaverack
Copy link
Member Author

Super exciting work! I took a quick look and so far your implementation looks great. I have a couple of follow-up questions for you though:

Thank you so much for the review!

  1. Do we need pprof, gops or completion? Not a deal-breaker, but It does add some overhead. I'm not familiar with our other images and if this is standard.

Hrm, possibly not. I copied another binary to form this one, and in general I've tried to keep to existing style wherever possible. I'm indifferent on if we keep them or remove them.

  1. Also not a deal breaker, but it might be a good idea to incorporate the hive framework here, as we might migrate this code over to it in the future anyways. This would also be helpful for potential testing on clustermesh in the future, as it would fit into the testing methodology we are working towards. For example, one could import the etcd-init Cell and use it to init a new etcd database within another integration test.

I must admit I'm not very familiar with hive. I'll do some research and see what can be done.

  1. Any ideas or plans on how we want to write tests? I think starting with some kind of integration test where we compare an etcd database that has been initialized with this code to an etcd database that has been initialized using the shell script it will replace. We could even write a test that performs an init with this code and just checks the state of the database to make sure everything we expect is there.

Yeah, there's a few levels of test we can write

  • A script comparison test is great for this implementation — but that means that the script will need to be mantained as future changes are made in order to keep the test passing. Is that an acceptable level of matinance?
  • I assume there are existing end-to-end tests for cluster mesh? I presume they would fail if a mistake was made here?
  • We could try to mock the etcd client for the init code, but I think that'd be rather brittle as a test, and not really actually test half the code here. So. I'd err away from that option, and certinally not as a replacement for the integration test.

@giorio94
Copy link
Member

I haven't yet had an in-depth look at the code. One aspect that worries me a bit, though, is that this binary will use a fixed version of etcd (the one specified in the go.mod file), which will be likely different from the one used for the real etcd container configured by the user, possibly leading to subtle bugs in case of incompatibilities. Moreover, we should also make multiple versions of this binary available (for different etcd versions), which is also problematic. WDYT?

@JamesLaverack
Copy link
Member Author

I haven't yet had an in-depth look at the code. One aspect that worries me a bit, though, is that this binary will use a fixed version of etcd (the one specified in the go.mod file), which will be likely different from the one used for the real etcd container configured by the user, possibly leading to subtle bugs in case of incompatibilities. Moreover, we should also make multiple versions of this binary available (for different etcd versions), which is also problematic. WDYT?

It's a good point. Do we have a documented set of etcd versions that we support?

Building multiple versions is of course possible. I'm not certian if we can import multiple different etcd versions in the same Go source tree — but I think it's possible. Failing that some makefile magic might be required.

I did consider not useing embedded etcd, and building the init binary as a client-only thing and layering it ontop of etcd. But we'd still need to build, test, and distribute multiple versions for different etcd versions so that doesn't really help. In a way this problem already exists — we're just presuming the etcdctl API wouldn't change in the existing shell script.

@giorio94
Copy link
Member

It's a good point. Do we have a documented set of etcd versions that we support?

I don't think it is explicitly documented somewhere, but maybe I just don't know about that. Client side we are checking that the etcd server version is >=3.1.0 though.

Building multiple versions is of course possible. I'm not certian if we can import multiple different etcd versions in the same Go source tree — but I think it's possible. Failing that some makefile magic might be required.

I did consider not useing embedded etcd, and building the init binary as a client-only thing and layering it ontop of etcd. But we'd still need to build, test, and distribute multiple versions for different etcd versions so that doesn't really help. In a way this problem already exists — we're just presuming the etcdctl API wouldn't change in the existing shell script.

Yeah, I also haven't found yet any good solution which does not require having to build multiple versions (unless etcd supported configuration files). Yet, the etcd docs mention that minor version downgrade is not supported, and minor version upgrades shall only performed one version at a time. Both of which are potential issues in case of mismatching etcd versions between the custom init container and the real etcd one.

@github-actions
Copy link

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 28, 2023
@JamesLaverack
Copy link
Member Author

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

I'm still working on this, just been on PTO recently. 😅

@giorio94 giorio94 removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 28, 2023
@maintainer-s-little-helper
Copy link

Commits 848145a, 6f50e1b, 35b345a, 7156600, 4d02e7e do not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Aug 8, 2023
@JamesLaverack
Copy link
Member Author

Both of which are potential issues in case of mismatching etcd versions between the custom init container and the real etcd one.

We should have control of that though? We set the etcd image version in our install YAML right?

@giorio94
Copy link
Member

giorio94 commented Aug 8, 2023

Both of which are potential issues in case of mismatching etcd versions between the custom init container and the real etcd one.

We should have control of that though? We set the etcd image version in our install YAML right?

Kinda. We configure those as part of the helm values, which means that users can modify them if they wish.

@github-actions
Copy link

github-actions bot commented Sep 8, 2023

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Sep 8, 2023
@github-actions
Copy link

This pull request has not seen any activity since it was marked stale.
Closing.

@github-actions github-actions bot closed this Sep 22, 2023
@JamesLaverack
Copy link
Member Author

/reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. kind/community-contribution This was a contribution made by a community member. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update clustermesh-apiserver initContainer to configure etcd for v3.5.7 or later
3 participants