Skip to content

Conversation

patriknw
Copy link
Contributor

Reproducer:

  1. old cluster of node1, node2 and node3
  2. shutdown node3 and start it again with same host:port, let it
    join itself and not the old cluster
  3. node1 and node2 will continue to gossip to the node3 address and
    Status message is accepted and replied to (Delta is ignored from
    unknown node)

Solution:

  • ignore status message from unknown node
  • also added a reply flag in the Status message to break the
    back-and-forth replies in case the deltas are not accepted,
    this is not needed for fixing this bug, but it adds an extra
    level of safety

Reproducer:
1. old cluster of node1, node2 and node3
2. shutdown node3 and start it again with same host:port, let it
   join itself and not the old cluster
3. node1 and node2 will continue to gossip to the node3 address and
   Status message is accepted and replied to (Delta is ignored from
   unknown node)

Solution:
* ignore status message from unknown node
* also added a reply flag in the Status message to break the
  back-and-forth replies in case the deltas are not accepted,
  this is not needed for fixing this bug, but it adds an extra
  level of safety
enterBarrier("after-1")
}

"handle restart of nodes with same address" in within(30 seconds) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not easy to test. I had to be somewhat "creative" to write a test for the issue. This was failing before the fix.

@akka-ci akka-ci added the validating PR is currently being validated by Jenkins label Jun 28, 2016
@patriknw
Copy link
Contributor Author

Refs #20846

@akka-ci akka-ci added needs-attention Indicates a PR validation failure (set by CI infrastructure) and removed validating PR is currently being validated by Jenkins labels Jun 28, 2016
@akka-ci
Copy link

akka-ci commented Jun 28, 2016

Test FAILed.

val delta = collectDelta(otherVersions)
if (delta.nonEmpty)
sender() ! Delta(delta)
if (!reply && otherHasNewerVersions(otherVersions))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reply is probably a misnomer as some kind of reply is clearly sent. Can you replace it with a more descriptive name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will do

@drewhk
Copy link
Contributor

drewhk commented Jun 28, 2016

LGTM

@patriknw
Copy link
Contributor Author

bin compat failed, yeah I changed an internal message

@patriknw
Copy link
Contributor Author

@drewhk I have incorporated your feedback 41fab9b

@akka-ci akka-ci added validating PR is currently being validated by Jenkins tested PR that was successfully built and tested by Jenkins and removed needs-attention Indicates a PR validation failure (set by CI infrastructure) validating PR is currently being validated by Jenkins labels Jun 29, 2016
@akka-ci
Copy link

akka-ci commented Jun 29, 2016

Test PASSed.

@patriknw patriknw merged commit 8b274eb into master Jul 1, 2016
@patriknw patriknw deleted the wip-20846-pubsub-gossip-patriknw branch July 1, 2016 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tested PR that was successfully built and tested by Jenkins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants