Cluster singleton manager: don't send member events to FSM during shutdown #24236

chbatey · 2018-01-03T15:00:47Z

There exists a race where a cluster node that is being downed sees its
self as the oldest node (as it has had the other nodes removed) and it
takes over the singleton manager sending the real oldest node to go into
the End state meaning that cluster singletons never work again.

This fix simply prevents Member events being given to the Cluster
Manager FSM during a shut down, instread relying on SelfExiting.

This also hardens the test by not downing the node that the current
sharding coordinator is running on as well as fixing a bug in the
probes.

Refs #24113

…tdown There exists a race where a cluter node that is being downed seens its self as the oldest node (as it has had the other nodes removed) and it takes over the singleton manager sending the real oldest node to go into the End state meaning that cluster singletons never work again. This fix simply prevents Member events being given to the Cluster Manager FSM during a shut down, instread relying on SelfExiting. This also hardens the test by not downing the node that the current sharding coordinator is running on as well as fixing a bug in the probes.

chbatey · 2018-01-03T15:02:07Z

The test fails when the downed node has the other remembers removed. This happens locally 1/10ish.

chbatey · 2018-01-03T15:20:00Z

Gong to run the multi node jobs on repeat job for this

akka-ci · 2018-01-03T15:59:25Z

Test PASSed.

patriknw

good catch

wonder if the original issue should be kept and this is another thing?
I wrote:

The real issue that should be fixed is that there seems to be a race between the CS and the ClusterSingleton observing OldestChanged and terminating coordinator singleton before the graceful sharding stop is done

chbatey · 2018-01-04T07:37:40Z

Yes as i don't think it'll fix that one ^ this just fixed one test issue + one actual bug. So lets keep #24113 open for changing the test back to shutting down a node that has the coordinator

chbatey · 2018-01-05T07:49:20Z

This one causing a lot of failures, someone else from @akka/akka-team mind reviewing this?

raboof · 2018-01-05T08:47:24Z

akka-cluster/src/main/scala/akka/cluster/Cluster.scala

@@ -409,7 +410,7 @@ class Cluster(val system: ExtendedActorSystem) extends Extension {
   * Should not called by the user. The user can issue a LEAVE command which will tell the node
   * to go through graceful handoff process `LEAVE -&gt; EXITING -&gt; REMOVED -&gt; SHUTDOWN`.
   */
-  private[cluster] def shutdown(): Unit = {
+  @InternalApi private[cluster] def shutdown(): Unit = {


…tdown (akka#24236) There exists a race where a cluter node that is being downed seens its self as the oldest node (as it has had the other nodes removed) and it takes over the singleton manager sending the real oldest node to go into the End state meaning that cluster singletons never work again. This fix simply prevents Member events being given to the Cluster Manager FSM during a shut down, instread relying on SelfExiting. This also hardens the test by not downing the node that the current sharding coordinator is running on as well as fixing a bug in the probes.

akka-ci added the validating PR is currently being validated by Jenkins label Jan 3, 2018

akka-ci added tested PR that was successfully built and tested by Jenkins and removed validating PR is currently being validated by Jenkins labels Jan 3, 2018

patriknw reviewed Jan 3, 2018

View reviewed changes

johanandren approved these changes Jan 4, 2018

View reviewed changes

raboof approved these changes Jan 5, 2018

View reviewed changes

raboof merged commit 0380cc5 into akka:master Jan 5, 2018

Aaronontheweb mentioned this pull request Jan 11, 2018

ClusterSingletonManager should ignore FSM events during shutdown akkadotnet/akka.net#3263

Closed

zbynek001 mentioned this pull request Jun 24, 2018

Sharding update akkadotnet/akka.net#3524

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cluster singleton manager: don't send member events to FSM during shutdown #24236

Cluster singleton manager: don't send member events to FSM during shutdown #24236

Uh oh!

chbatey commented Jan 3, 2018 •

edited

Loading

Uh oh!

chbatey commented Jan 3, 2018

Uh oh!

chbatey commented Jan 3, 2018

Uh oh!

akka-ci commented Jan 3, 2018

Uh oh!

patriknw left a comment

Uh oh!

chbatey commented Jan 4, 2018

Uh oh!

chbatey commented Jan 5, 2018

Uh oh!

raboof Jan 5, 2018

Uh oh!

Uh oh!

Cluster singleton manager: don't send member events to FSM during shutdown #24236

Cluster singleton manager: don't send member events to FSM during shutdown #24236

Uh oh!

Conversation

chbatey commented Jan 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chbatey commented Jan 3, 2018

Uh oh!

chbatey commented Jan 3, 2018

Uh oh!

akka-ci commented Jan 3, 2018

Uh oh!

patriknw left a comment

Choose a reason for hiding this comment

Uh oh!

chbatey commented Jan 4, 2018

Uh oh!

chbatey commented Jan 5, 2018

Uh oh!

raboof Jan 5, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chbatey commented Jan 3, 2018 •

edited

Loading