-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Looks like #2455 is a symptom of this issue.
Under some circumstances, MemberRemoved
is not correctly propagated from the leader to its children.
Some context from an end-user in our Gitter chat earlier today:
When a node gets into this state it doesn't leave cleanly. It tries to, I have a monitor running in all service discovery nodes (2 of them). They both report the cluster status that they see. When this problem happens the cluster status us everything UP and everything Seen. The leader gets the request that the node is exiting and this is logged every second that it is moving the node to exiting but it never exits. After 15 seconds my windows service will kill the service and failure detection will kick in. I manually down the node although just starting it again causes the cluster to see the new node and remove the old one. The cluster monitors both report the node is removed. The node rejoins and gets stuck.
Not entirely clear what the exact issue is, but this should give us enough information to go looking for it.