Skip to content

Conversation

unmarshall
Copy link
Contributor

@unmarshall unmarshall commented Feb 28, 2025

How to categorize this PR?

/area control-plane
/kind technical-debt

What this PR does / why we need it:

This PR introduces the following changes:

  • Removes the current dependency on github.com/gardener/etcd-druid and replaces it with github.com/gardener/etcd-druid/api dependency.
  • Removes the Etcd and EtcdCopyBackupsTask CRDs from this repository and instead uses github.com/gardener/etcd-druid/api/core/v1alpha1/crds to get these CRDs directly from etcd-druid.
    • etcd-druid PR#995 introduces CR validations via CEL expressions that are baked into the CRDs. This avoids a round-trip from KAPI to the component hosting the admission webhook. Since this is only available for clusters with k8s version >= 1.29 therefore we will provide a way to fetch relevant CRDs (with or without the CEL expressions) via etcd-druid CRD APIs.
  • Removes the CRD generation for etcd-druid from hacks/generate-crds.sh

Which issue(s) this PR fixes:
Fixes #11544

Special notes for your reviewer:

Release note:

Remove the double maintenance of CRDs defined in etcd-druid. Now, gardenlet is using the etcd-druid CRD APIs to get the CRDs ensuring one single source of truth.
Remove the dependency on the `github.com/gardener/etcd-druid` module and instead introduce dependency on `github.com/gardener/etcd-druid/api` module.
The following dependencies have been updated:
- `gardener/etcd-druid` from `v0.27.0` to `v0.28.0`. [Release Notes](https://redirect.github.com/gardener/etcd-druid/releases/tag/v0.28.0)
- `github.com/gardener/etcd-druid` from `v0.27.0` to `v0.28.0`. 
`/hack/generate-crds.sh` will no longer generate any CRDs with `group=druid.gardener.cloud`. One must use [etcd-druid API](https://github.com/gardener/etcd-druid/blob/v0.28.0/api/core/v1alpha1/crds/crd.go#L35) to get the CRDs that serve as a single source of truth for all etcd-druid CRDs.
The etcd-druid's GA-ed `UseEtcdWrapper` feature gate is removed. It is now unconditionally enabled. It should no longer be passed in gardenlet configuration. Before upgrading to this version of Gardener, check your gardenlet configuration and make sure that it does not specify the etcd-druid's `UseEtcdWrapper` feature gate.
The default etcd-main storage is increased from `10Gi` to `25Gi`. The etcd-main storage capacity is mutated by provider extensions. Before upgrading to this version of Gardener, make sure the provider extensions which you use mutate the etcd-main capacity. Otherwise, the default storage capacity change in Gardener could be unexpected or breaking.

@gardener-prow gardener-prow bot added area/control-plane Control plane related kind/technical-debt Something that is only solved on the surface, but requires more (re)work to be done properly cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. labels Feb 28, 2025
@unmarshall unmarshall marked this pull request as draft February 28, 2025 08:02
@gardener-prow gardener-prow bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 28, 2025
@gardener-prow gardener-prow bot requested review from Kostov6 and tobschli February 28, 2025 08:02
@gardener-prow gardener-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 7, 2025
@unmarshall unmarshall changed the title Adopt etcd-druid API go module Adapt etcd-druid API go module Mar 10, 2025
@gardener-prow gardener-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 10, 2025
@unmarshall unmarshall marked this pull request as ready for review March 11, 2025 06:16
@gardener-prow gardener-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 11, 2025
Copy link
Member

@LucaBernstein LucaBernstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you!
/lgtm

@gardener-prow gardener-prow bot added lgtm Indicates that a PR is ready to be merged. and removed lgtm Indicates that a PR is ready to be merged. labels Mar 12, 2025
@gardener-prow gardener-prow bot requested a review from LucaBernstein March 12, 2025 08:42
@unmarshall
Copy link
Contributor Author

/test pull-gardener-integration

@gardener-prow gardener-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 12, 2025
@gardener-prow gardener-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 13, 2025
@unmarshall
Copy link
Contributor Author

/test pull-gardener-integration

@unmarshall
Copy link
Contributor Author

/test pull-gardener-e2e-kind-operator-seed

@gardener-prow gardener-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 27, 2025
@unmarshall unmarshall requested a review from ialidzhikov March 27, 2025 06:51
@unmarshall
Copy link
Contributor Author

/test pull-gardener-integration

@unmarshall
Copy link
Contributor Author

/test pull-gardener-integration

@unmarshall unmarshall requested a review from rfranzke March 30, 2025 06:21
@unmarshall unmarshall requested a review from rfranzke March 31, 2025 08:43
Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Mar 31, 2025
Copy link
Contributor

gardener-prow bot commented Mar 31, 2025

LGTM label has been added.

Git tree hash: b6ba5a80d657d9b94a19ec6d8ee893c13f2be72a

@ialidzhikov
Copy link
Member

I will have a final look.

Copy link
Member

@ialidzhikov ialidzhikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Copy link
Contributor

gardener-prow bot commented Mar 31, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ialidzhikov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 31, 2025
@ialidzhikov
Copy link
Member

ialidzhikov commented Mar 31, 2025

In retrospective, reviewing this PR was not the best experience (at least for me). There were too many changes coupled to the new version of etcd-druid and one cannot upgrade to it without adapting to the breaking changes:

  • There were breaking changes which were not documented as such:
    • The removal of the UseEtcdWrapper feature gate. The feature gate was GA since longtime and could be cleaned up since longtime from Gardener code-base with other unrelated PRs.
    • The default etcd-main storage was increased from 10Gi to 25Gi which might be an unexpected change if provider extension implementation does not mutate the etcd-main storage capacity.
  • I am also not very sure about moving the etcd-druid CRDs to the etcd-druid module as it is not aligned with the way how we consume other CRDs. And it won't be aligned never, for example we cannot influence upstream projects like fluent-bit operator, VPA and others to provide similar APIs for fetching the CRDs. We could have aligned in advance whether to do this or not.
  • Many no-op changes caused by changing the import alias which made hard to spot the meaningful changes from the renaming changes.
  • The commits were not very well structured. 40 commits in total and not possible to review the PR going over them one by one.

I am providing this feedback for future etcd-druid upgrade PRs. It would be better to do not have that many breaking changes in a single release. I would be much better if a sinle release of etcd-druid switched to the API module. Other release, switched to using the etcds CRDs from the etcd-druid API module. Other release that does other breaking changes like changing the default capacity, introducing the CEL validations, etc.

@gardener-prow gardener-prow bot merged commit 679123c into gardener:master Mar 31, 2025
19 checks passed
@unmarshall
Copy link
Contributor Author

unmarshall commented Mar 31, 2025

In retrospective, reviewing this PR was not the best experience (at least for me). There were too many changes coupled to the new version of etcd-druid and one cannot upgrade to it without adapting to the breaking changes

Thanks for the feedback @ialidzhikov.

The removal of the UseEtcdWrapper feature gate. The feature gate was GA since longtime and could be cleaned up since longtime from Gardener code-base with other unrelated PRs.

We usually wait for 3 releases before moving to GA. It happened to be the same release as druid API module changes. Now we try and follow regular release cycles instead of doing it adhoc. This was a team decision. Having said that I only realised it much later that this breaking change was included in the PR. This was a clear miss from our side.

The default etcd-main storage was increased from 10Gi to 25Gi which might be an unexpected change if provider extension implementation does not mutate the etcd-main storage capacity.

Unfortunately till now we have had very limited validation for Etcd resources. This release we re-looked at it. The change in default size from 10Gi to 25Gi is not new and has been a requirement from etcd-backup-restore for long. As explained if one looks at any provider the size for a volume for any etcd-cluster will be > 25Gi (which is at least 3 times the max DB size - which is usually 8Gi). 10Gi is used for KIND based tests. This has worked so far because the load on etcd DB is kept to a very low value in e2e tests. So everything works. However as soon you load the etcd and become > 3Gi and run recovery/restore then you will start to run into issues. So this check/validation was always required (missing before and only introduced now). The need to have a comprehensive validating webhook was pending since 2022 (see gardener/etcd-druid#409). We have gathered pace and are now addressing gaps.

I am also not very sure about moving the etcd-druid CRDs to the etcd-druid module as it is not aligned with the way how we consume other CRDs

Gardener has a specific way to manage/consume CRDs. Since we are also positioning etcd-druid to be used outside Gardener we wanted to provide both ways (via Helm charts) and also programmatic access to CRDs. We wanted to ensure that etcd-druid is the single source for all CRDs. I am aware that it creates a difference in consumption pattern as far as g/g is concerned and we are open to discussions and improving g/g usage as that is the primary consumer for etcd-druid.

Many no-op changes caused by changing the import alias which made hard to spot the meaningful changes from the renaming changes.

Since CRD apis were introduced that caused the import alias to be relooked at.

The commits were not very well structured. 40 commits in total and not possible to review the PR going over them one by one

I will take this as a feedback for any PR that is raised on g/g.

Thanks for the time spent on reviewing this PR and getting this in. In hindsight we can perhaps relook at how we manage breaking changes and spread it across releases. Sometimes it gets difficult as development activity in etcd-druid has picked up and there are a lot of refactoring activities that are currently on-going. This causes churn and with that comes changes (some of which will be breaking changes). Overall the goal is to improve the code+design quality and make it easier to add features (which we have a long list of).

ialidzhikov added a commit to ialidzhikov/gardener that referenced this pull request Jul 29, 2025
Introduced with gardener#11545. The correct pkg is `github.com/gardener/etcd-druid/api/core/v1alpha1/crds`, not `github.com/gardener/etcd-druid/api/core/crds`.
ialidzhikov added a commit to ialidzhikov/gardener that referenced this pull request Jul 29, 2025
Introduced with gardener#11545. The correct pkg is `github.com/gardener/etcd-druid/api/core/v1alpha1/crds`, not `github.com/gardener/etcd-druid/api/core/crds`.
gardener-prow bot pushed a commit that referenced this pull request Aug 13, 2025
* Clean up exclusion of non-existing `pkg/client/extensions` path

Extension clients are no longer generated after #4592.

* Clean up exclusion of non-existing `builtin$` path

I don't find in the project a path matching `builtin$`.

* Clean up exclusion of non-existing `examples$` path

I don't find in the project a path matching `examples$`. We have a dir named `example/`, not `examples/`.

* Update fluent-operator import alias

With #11673, the fluent-operator imports were updated from `github.com/fluent/fluent-operator/v2` to `github.com/fluent/fluent-operator/v3`.

* Clean up import alias for non-existing pkg

`github.com/gardener/gardener/extensions/pkg/apis/config` was removed with #11056.

* Clean up import alias for non-existing pkg(s)

Internal component config APIs were removed. See #11043

* Add `pkg/client/security` to the list of excluded dirs

`pkg/client/security` contains generated clientset, informers and listers. Similar to the other generated dirs, it should be excluded.

* Update the import alias for etcd-druid crds pkg

Introduced with #11545. The correct pkg is `github.com/gardener/etcd-druid/api/core/v1alpha1/crds`, not `github.com/gardener/etcd-druid/api/core/crds`.

* Clean up paths with generated code from `linters.exclusions.paths`

We already use `linters.exclusions.generated=lax`. `lax` mode should automatically detect and ignore generated files.
Ref https://golangci-lint.run/usage/configuration/#linters-configuration

* Clean up the exclusion of the `third_party` path

In the Makefile, we explicitly pass the paths to be checked. `third_party` is not among them. Hence, there is no need to exclude it as we don't check for it.
https://github.com/gardener/gardener/blob/v1.124.0/Makefile#L165

* Don't check for imports under `third_party`

`third_party` currently contains a forked dependency and generated mocks. Hence, I don't see much value for checking for imports there as it does not contain a pkg which is not generated or not copied.

* Address PR review feedback from dimitar-kostadinov
Duciwuci pushed a commit to stackitcloud/gardener that referenced this pull request Sep 1, 2025
* Clean up exclusion of non-existing `pkg/client/extensions` path

Extension clients are no longer generated after gardener#4592.

* Clean up exclusion of non-existing `builtin$` path

I don't find in the project a path matching `builtin$`.

* Clean up exclusion of non-existing `examples$` path

I don't find in the project a path matching `examples$`. We have a dir named `example/`, not `examples/`.

* Update fluent-operator import alias

With gardener#11673, the fluent-operator imports were updated from `github.com/fluent/fluent-operator/v2` to `github.com/fluent/fluent-operator/v3`.

* Clean up import alias for non-existing pkg

`github.com/gardener/gardener/extensions/pkg/apis/config` was removed with gardener#11056.

* Clean up import alias for non-existing pkg(s)

Internal component config APIs were removed. See gardener#11043

* Add `pkg/client/security` to the list of excluded dirs

`pkg/client/security` contains generated clientset, informers and listers. Similar to the other generated dirs, it should be excluded.

* Update the import alias for etcd-druid crds pkg

Introduced with gardener#11545. The correct pkg is `github.com/gardener/etcd-druid/api/core/v1alpha1/crds`, not `github.com/gardener/etcd-druid/api/core/crds`.

* Clean up paths with generated code from `linters.exclusions.paths`

We already use `linters.exclusions.generated=lax`. `lax` mode should automatically detect and ignore generated files.
Ref https://golangci-lint.run/usage/configuration/#linters-configuration

* Clean up the exclusion of the `third_party` path

In the Makefile, we explicitly pass the paths to be checked. `third_party` is not among them. Hence, there is no need to exclude it as we don't check for it.
https://github.com/gardener/gardener/blob/v1.124.0/Makefile#L165

* Don't check for imports under `third_party`

`third_party` currently contains a forked dependency and generated mocks. Hence, I don't see much value for checking for imports there as it does not contain a pkg which is not generated or not copied.

* Address PR review feedback from dimitar-kostadinov
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane Control plane related cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/technical-debt Something that is only solved on the surface, but requires more (re)work to be done properly lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use etcd-druid API go module instead of depending upon etcd-druid
5 participants