-
Notifications
You must be signed in to change notification settings - Fork 3.6k
ignore PubSub Status message from unknown node, #20846 #20847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reproducer: 1. old cluster of node1, node2 and node3 2. shutdown node3 and start it again with same host:port, let it join itself and not the old cluster 3. node1 and node2 will continue to gossip to the node3 address and Status message is accepted and replied to (Delta is ignored from unknown node) Solution: * ignore status message from unknown node * also added a reply flag in the Status message to break the back-and-forth replies in case the deltas are not accepted, this is not needed for fixing this bug, but it adds an extra level of safety
enterBarrier("after-1") | ||
} | ||
|
||
"handle restart of nodes with same address" in within(30 seconds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not easy to test. I had to be somewhat "creative" to write a test for the issue. This was failing before the fix.
Refs #20846 |
Test FAILed. |
val delta = collectDelta(otherVersions) | ||
if (delta.nonEmpty) | ||
sender() ! Delta(delta) | ||
if (!reply && otherHasNewerVersions(otherVersions)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reply
is probably a misnomer as some kind of reply is clearly sent. Can you replace it with a more descriptive name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will do
LGTM |
bin compat failed, yeah I changed an internal message |
Test PASSed. |
Reproducer:
join itself and not the old cluster
Status message is accepted and replied to (Delta is ignored from
unknown node)
Solution:
back-and-forth replies in case the deltas are not accepted,
this is not needed for fixing this bug, but it adds an extra
level of safety