Incorrect computation of initial-cluster-state during single member restoration which can lead to cluster ID mismatch errors

**How to categorize this issue?**

/area control-plane
/kind bug

**What happened**:
A specific gardener e2e kind test is failing often - `Shoot Tests Hibernated Shoot [It] Create, Migrate and Delete [Shoot, control-plane-migration, hibernated]`

Creation, Migration and hibernation steps succeed. To do the deletion of the migrated shoot which is currently hibernated, you need to wake up the etcd-cluster. At this stage the etcd cluster is not getting ready. 

In one such occurrence we see the following logs in etcd-events-2 (backup-restore container):
```
2025-02-17T12:45:52.969873914Z stderr F 2025-02-17 12:45:52.968607 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:52.970531124Z stderr F 2025-02-17 12:45:52.970317 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:53.055124837Z stderr F 2025-02-17 12:45:53.054945 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:53.062374513Z stderr F 2025-02-17 12:45:53.062106 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:53.153435731Z stderr F 2025-02-17 12:45:53.153314 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:53.160917167Z stderr F 2025-02-17 12:45:53.160807 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:53.251792044Z stderr F 2025-02-17 12:45:53.251680 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
2025-02-17T12:45:53.264667024Z stderr F 2025-02-17 12:45:53.264552 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[6fdaf30df04c0245]=4ffa550a92b87675, local=39b1e34c77b1db7a)
```

For complete logs see: [etcd-events-2-backup-restore.log](https://github.com/user-attachments/files/18903489/etcd-events-2-backup-restore.log)

You would typically see `cluster ID mismatch` in the 3 scenarios that are documented [here](https://github.com/ahrtr/etcd-issues/blob/master/docs/cluster_id_mismatch.md).

Prior to starting the embedded etcd process, initialization is triggered by etcd-wrapper. Once the initialization succeeds, etcd-wrapper requests for etcd config. etcd-backup-restore computes the etcd config [here](https://github.com/gardener/etcd-backup-restore/blob/44b7d1b228c4d2aa92e8104957a1812302d8c4b9/pkg/server/httpAPI.go#L412). One of the key parameters in the etcd config is to determine the `initial-cluster-state` which is done [here](https://github.com/gardener/etcd-backup-restore/blob/44b7d1b228c4d2aa92e8104957a1812302d8c4b9/pkg/server/httpAPI.go#L501) to distinguish if this member bootstraps/joins a new cluster or joins an existing cluster.

If member list API call fails (see [IsLearnerPresent](https://github.com/gardener/etcd-backup-restore/blob/44b7d1b228c4d2aa92e8104957a1812302d8c4b9/pkg/member/member_control.go#L301)) due to any reason then this function correctly returns an error which is swallowed by the calling function (see [here](https://github.com/gardener/etcd-backup-restore/blob/44b7d1b228c4d2aa92e8104957a1812302d8c4b9/pkg/server/httpAPI.go#L519-L522)) and the calling function assumes `initial-cluster-state=new`. This is done for  `0->3 replicas` bootstrap case because while bootstrapping a new cluster etcd Member API calls will never succeed. Even in case of errors, we have to serve the config with `initial-cluster-state=new` to let the bootstrap succeed. 

However, the above code-flow has a negative consequence as well. Consider the following case:
 - Data directory of one of the etcd member gets corrupted while bringing up the cluster from `0->3`.
 - Etcd-backup-restore validates the data directory and finds it corrupt. It will trigger the single member restoration (see [this](https://github.com/gardener/etcd-druid/blob/v0.22.7/docs/operations/restoring-single-member-in-multi-node-etcd-cluster.md) for more information).
 - As part of single-member-restoration, it will add this member as a learner after which it will trigger the initialization. Once initialization is successful, it will serve an etcd config. 
 - While computing the `initial-cluster-state` if there is an error while making the etcd Member API call (due to transient quorum loss - possible due to VPA eviction etc.) then it assumes `initial-cluster-state` as `new`. This will cause `Cluster ID mismatch` as this state for a `learner` as it's not the correct inital-cluster state. 
 - This will force this member to create a new member ID which will never match with the member IDs that are known by other 2 members of the etcd cluster. Once it dials the other 2 members then they will reject the call with the `Cluster ID mismatch` response.

**What you expected to happen**:
`initial-cluster-state` should always be computed correctly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect computation of initial-cluster-state during single member restoration which can lead to cluster ID mismatch errors #847

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect computation of initial-cluster-state during single member restoration which can lead to cluster ID mismatch errors #847

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions