-
Notifications
You must be signed in to change notification settings - Fork 98
Add support for removing/adding metadata server nodes #3138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. I can't be 100% sure re. correctness but with the help of adding those scenarios to our jepsen test suite, we'll get more confident.
info!( | ||
"Asked to leave the metadata store cluster as of NodesConfiguration '{}' while \ | ||
still being a member of the configuration. This indicates that I missed the \ | ||
configuration change to remove me.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything we can tell the user about the outcome of the situation, for instance if we'll attempt to correct it, or if they need to do something about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is purely informational for the user and he doesn't have to do anything. I'll add that the member is now leaving the metadata store cluster.
); | ||
self.configuration = new_configuration; | ||
let mut config_change_rejections = Vec::default(); | ||
let nodes_config = self.kv_storage.last_seen_nodes_configuration(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure: We are guaranteed to be in a lock-step between nodes configuration changes and Entry
we are processing at all times, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for all deterministic operations we must use the kv_storage.last_seen_nodes_configuration
which is the current NodesConfiguration
based on the applied raft log. The node itself might have heard about a newer metadata version that is stored in Metadata
and which is used for operations where we are not applying a Raft entry (e.g. before proposing a new command/configuration change).
Thanks for the review @AhmedSoliman. I'll add a test case for reconfiguring a metadata cluster and check with Pavel whether we can include it in the Jepsen tests. |
797ec85
to
16b50cc
Compare
This commit adds restatectl metadata-server add-node/remove-node to add and remove nodes to and from the metadata cluster. Internally, an add command instructs the joining node to reach out to the leader to ask for a reconfiguration. A remove command is sent to the leader where it triggers a reconfiguration to remove the given node. To also stop removed metadata cluster members that have missed the configuration change, we are using the MetadataServerState stored in the NodesConfiguration as an additional signal. If a node detects that its MetadataServerState is set to Standby while it is running as a Member, then it will step down. Likewise, if a Standby node detects that its MetadataServerState is Member, then it will try to join the cluster because it might mean that it tried joining before but failed just before receiving the response. This fixes restatedev#2994.
16b50cc
to
eb64ff6
Compare
This commit adds restatectl metadata-server add-node/remove-node to add and remove nodes to and from the metadata cluster. Internally, an add command instructs the joining node to reach out to the leader to ask for a reconfiguration. A remove command is sent to the leader where it triggers a reconfiguration to remove the given node. To also stop removed metadata cluster members that have missed the configuration change, we are using the MetadataServerState stored in the NodesConfiguration as an additional signal. If a node detects that its MetadataServerState is set to Standby while it is running as a Member, then it will step down. Likewise, if a Standby node detects that its MetadataServerState is Member, then it will try to join the cluster because it might mean that it tried joining before but failed just before receiving the response.
This fixes #2994.