Skip to content

ClusterSingletonManager does not handle event TakeOver in state Younger #25639

@412b

Description

@412b

What happens
especially when using CoordinatedShutdown I've observed a lot of Ignoring TakeOver request in [Younger] from, when oldest is shutting down, which means that handover takes way more time than it should. the reason is that new singleton holder gets aware of the oldest shutting down later than oldest has started the handover.

What happens (detailed)
starting up:

[INFO] [09/18/2018 15:38:56.744] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-19] [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553/user/echo] Singleton manager starting singleton actor [akka://ClusterSingletonRestartSpec/user/echo/singleton]
[INFO] [09/18/2018 15:38:56.745] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-19] [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553/user/echo] ClusterSingletonManager state change [Start -> Oldest]
[INFO] [09/18/2018 15:38:57.584] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-15] [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45471/user/echo] ClusterSingletonManager state change [Start -> Younger]
[INFO] [09/18/2018 15:38:57.618] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-15] [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45471/user/proxy2] Singleton identified at [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553/user/echo/singleton]

shutting down:

[INFO] [09/18/2018 15:38:58.532] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-18] [akka.cluster.Cluster(akka://ClusterSingletonRestartSpec)] Cluster Node [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553] - Cluster Node [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553] dc [default] is no longer the leader
[INFO] [09/18/2018 15:38:58.566] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-15] [akka.cluster.Cluster(akka://ClusterSingletonRestartSpec)] Cluster Node [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45471] - Cluster Node [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45471] dc [default] is the new leader
[INFO] [09/18/2018 15:38:58.637] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-19] [akka.cluster.Cluster(akka://ClusterSingletonRestartSpec)] Cluster Node [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553] - Marked address [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553] as [Leaving]
[INFO] [09/18/2018 15:38:58.644] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-15] [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553/user/echo] Exited [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553]

start of the handover:

[INFO] [09/18/2018 15:38:58.647] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-17] [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553/user/echo] Oldest observed OldestChanged: [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553 -> Some(akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45471)]
[INFO] [09/18/2018 15:38:58.649] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-17] [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553/user/echo] ClusterSingletonManager state change [Oldest -> WasOldest]

new singleton holder haven't yet seen change of the oldest:

[INFO] [09/18/2018 15:38:58.651] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-16] [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45471/user/echo] Ignoring TakeOver request in [Younger] from [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553].
[INFO] [09/18/2018 15:38:59.567] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-16] [akka.cluster.Cluster(akka://ClusterSingletonRestartSpec)] Cluster Node [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45471] - Leader is moving node [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553] to [Exiting]

and now new holder is aware:

[INFO] [09/18/2018 15:38:59.573] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-3] [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45471/user/echo] Younger observed OldestChanged: [Some(akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553) -> myself]
[INFO] [09/18/2018 15:38:59.575] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-3] [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45471/user/echo] ClusterSingletonManager state change [Younger -> BecomingOldest]
[INFO] [09/18/2018 15:38:59.577] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-15] [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553/user/echo] ClusterSingletonManager state change [WasOldest -> HandingOver]
[INFO] [09/18/2018 15:38:59.578] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-15] [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553/user/echo] Singleton terminated, hand-over done [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553 -> Some(akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45471)]
[INFO] [09/18/2018 15:38:59.579] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-15] [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553/user/echo] ClusterSingletonManager state change [HandingOver -> End]
[INFO] [09/18/2018 15:38:59.579] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-18] [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45471/user/echo] Hand-over in progress at [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553]
[INFO] [09/18/2018 15:38:59.581] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-18] [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45471/user/echo] Singleton manager starting singleton actor [akka://ClusterSingletonRestartSpec/user/echo/singleton]
[INFO] [09/18/2018 15:38:59.581] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-18] [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45471/user/echo] ClusterSingletonManager state change [BecomingOldest -> Oldest]
[INFO] [09/18/2018 15:39:00.541] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-19] [akka.cluster.Cluster(akka://ClusterSingletonRestartSpec)] Cluster Node [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553] - Exiting, starting coordinated shutdown
[INFO] [09/18/2018 15:39:00.542] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-18] [akka.cluster.Cluster(akka://ClusterSingletonRestartSpec)] Cluster Node [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553] - Exiting completed
[INFO] [09/18/2018 15:39:00.546] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-18] [akka.cluster.Cluster(akka://ClusterSingletonRestartSpec)] Cluster Node [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553] - Shutting down...
[INFO] [09/18/2018 15:39:00.550] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-18] [akka.cluster.Cluster(akka://ClusterSingletonRestartSpec)] Cluster Node [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553] - Successfully shut down
[INFO] [09/18/2018 15:39:00.551] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-18] [akka.cluster.Cluster(akka://ClusterSingletonRestartSpec)] Cluster Node [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45471] - Exiting confirmed [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553]
[INFO] [09/18/2018 15:39:00.558] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-16] [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553/user/echo] Self removed, stopping ClusterSingletonManager
[INFO] [09/18/2018 15:39:00.566] [ClusterSingletonRestartSpec-akka.actor.default-dispatcher-19] [akka.cluster.Cluster(akka://ClusterSingletonRestartSpec)] Cluster Node [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45471] - Leader is removing confirmed Exiting node [akka.tcp://ClusterSingletonRestartSpec@127.0.0.1:45553]

the problem here is that in bigger real-life cluster (like we have) handover happens many TakeOverFromMe later, it can even give up trying.

the fix can be pretty trivial, I can submit PR right away, the main question is if going from Younger to BecomingOldest without seeing OldestChanged is acceptable. WDYT?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions