Skip to content

Add support for removing replicated metadata servers from a metadata cluster #2994

@tillrohrmann

Description

@tillrohrmann

To remove a Restate node from the cluster, we need to support removing metadata servers from a replicated metadata cluster. Right now, nodes that run the Role::MetadataServer will try to join the metadata cluster if their node has the MetadataServerState::Member which every node gets assigned by default. If nodes are started with MetadataServerState::Standby, then they won't try to join the metadata cluster.

One way to add support for adding/removing metadata servers is to extend the MetadataServerState enum:

enum MetadataServerState {
  /// The node tries to join a metadata cluster. It is not safe to stop this node since it might already been part of the metadata cluster.
  Joining,
  /// The node is an active member of the metadata cluster.
  Member,
  /// The node is an active member of the metadata cluster but should be removed from it.
  Leaving,
  /// The node is not part of any metadata cluster and can be safely stopped.
  Standby,
}

In state Joining, a node tries to join an existing metadata cluster by sending its member id to the current leader. Once the leader applies the configuration change, it will also update the state of the joining node to Member in the NodesConfiguration. At this point, the joining node can run as a Member.

In state Member, a node knows that it is part of the metadata cluster and might be needed to reach write quorum.

When a leader of the metadata cluster sees a node to be in state Leaving, then it will remove it from the current cluster configuration. Until this configuration change has been applied, the leaving node is supposed to act as a normal member. With applying the configuration change, the state will be updated to Standby in the NodesConfiguration. At this point, the leaving node will stop running as a Member.

In state Standby, a node will monitor the NodesConfiguration until its state switches to Joining.

When adding/removing nodes, a controller is allowed to change Standby -> Joining and Member -> Leaving in the NodesConfiguration. The transition from Joining -> Member and Leaving -> Standby happens as part of the reconfiguration protocol. It is also allowed to switch back from Joining -> Standby and Leaving -> Member if the cluster configuration did not happen yet.

stateDiagram-v2
      Joining --> Member: Reconfiguration with added node
      Member --> Leaving: Remove node
      Leaving --> Member: Add node
      Leaving --> Standby: Reconfiguration with removed node
      Standby --> Joining: Add node
      Joining --> Standby: Remove node
Loading

Why aren't MetadataServerState::Member and MetadataServerState::Standby enough?

We need to distinguish between a node "leaving" a metadata cluster and having left/been removed because the node might be needed to commit the configuration change. Therefore, we have the distinction between Leaving and Standby. If a node sees that its state is Leaving, then it knows that it should still act as a member of the cluster until its state becomes Standby. Once it sees the state to be Standby in the NodesConfiguration, then it knows that it is safe to stop running as a member of the cluster because the configuration change which removed the node has been committed. Note that a metadata cluster peer is not guaranteed to see all configuration changes as part of the Raft protocol (e.g. if it was down or simply not needed to reach write quorum for the configuration change).

Likewise when adding a node to the cluster we cannot directly go from Standby to Member because otherwise a node might miss sending a join request to the current cluster to trigger a configuration change. This problem can occur when removing a node which misses the removal configuration change and then only looks at the MetadataServerState to determine whether to keep running as a member or standby. With the explicit Joining state, a previous member would know that it should rejoin a metadata cluster instead of thinking it is still a Member.

cc @AhmedSoliman

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions