-
Notifications
You must be signed in to change notification settings - Fork 98
Description
To remove a Restate node from the cluster, we need to support removing metadata servers from a replicated metadata cluster. Right now, nodes that run the Role::MetadataServer
will try to join the metadata cluster if their node has the MetadataServerState::Member
which every node gets assigned by default. If nodes are started with MetadataServerState::Standby
, then they won't try to join the metadata cluster.
One way to add support for adding/removing metadata servers is to extend the MetadataServerState
enum:
enum MetadataServerState {
/// The node tries to join a metadata cluster. It is not safe to stop this node since it might already been part of the metadata cluster.
Joining,
/// The node is an active member of the metadata cluster.
Member,
/// The node is an active member of the metadata cluster but should be removed from it.
Leaving,
/// The node is not part of any metadata cluster and can be safely stopped.
Standby,
}
In state Joining
, a node tries to join an existing metadata cluster by sending its member id to the current leader. Once the leader applies the configuration change, it will also update the state of the joining node to Member
in the NodesConfiguration
. At this point, the joining node can run as a Member
.
In state Member
, a node knows that it is part of the metadata cluster and might be needed to reach write quorum.
When a leader of the metadata cluster sees a node to be in state Leaving
, then it will remove it from the current cluster configuration. Until this configuration change has been applied, the leaving node is supposed to act as a normal member. With applying the configuration change, the state will be updated to Standby
in the NodesConfiguration
. At this point, the leaving node will stop running as a Member
.
In state Standby
, a node will monitor the NodesConfiguration
until its state switches to Joining
.
When adding/removing nodes, a controller is allowed to change Standby -> Joining
and Member -> Leaving
in the NodesConfiguration
. The transition from Joining -> Member
and Leaving -> Standby
happens as part of the reconfiguration protocol. It is also allowed to switch back from Joining -> Standby
and Leaving -> Member
if the cluster configuration did not happen yet.
stateDiagram-v2
Joining --> Member: Reconfiguration with added node
Member --> Leaving: Remove node
Leaving --> Member: Add node
Leaving --> Standby: Reconfiguration with removed node
Standby --> Joining: Add node
Joining --> Standby: Remove node
Why aren't MetadataServerState::Member
and MetadataServerState::Standby
enough?
We need to distinguish between a node "leaving" a metadata cluster and having left/been removed because the node might be needed to commit the configuration change. Therefore, we have the distinction between Leaving
and Standby
. If a node sees that its state is Leaving
, then it knows that it should still act as a member of the cluster until its state becomes Standby
. Once it sees the state to be Standby
in the NodesConfiguration
, then it knows that it is safe to stop running as a member of the cluster because the configuration change which removed the node has been committed. Note that a metadata cluster peer is not guaranteed to see all configuration changes as part of the Raft protocol (e.g. if it was down or simply not needed to reach write quorum for the configuration change).
Likewise when adding a node to the cluster we cannot directly go from Standby
to Member
because otherwise a node might miss sending a join request to the current cluster to trigger a configuration change. This problem can occur when removing a node which misses the removal configuration change and then only looks at the MetadataServerState
to determine whether to keep running as a member or standby. With the explicit Joining
state, a previous member would know that it should rejoin a metadata cluster instead of thinking it is still a Member
.