Skip to content

deadlock when initializing Akka cluster: document minimum thread pool size #17253

@clockfly

Description

@clockfly

Problem: Timeout when akka cluster trying to create the Cluster extension.

Frequency: random

java.util.concurrent.TimeoutException: Futures timed out after [60000 milliseconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:116)
        at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:116)
        at akka.cluster.Cluster.liftedTree1$1(Cluster.scala:172)
        at akka.cluster.Cluster.<init>(Cluster.scala:171)
        at akka.cluster.Cluster$.createExtension(Cluster.scala:42)
        at akka.cluster.Cluster$.createExtension(Cluster.scala:37)
        at akka.actor.ActorSystemImpl.registerExtension(ActorSystem.scala:711)
        at akka.actor.ExtensionId$class.apply(Extension.scala:79)
        at akka.cluster.Cluster$.apply(Cluster.scala:37)
        at 
     ......
akka.cluster.ClusterActorRefProvider.createRemoteWatcher(ClusterActorRefProvider.scala:66)
        at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:186)
        at akka.cluster.ClusterActorRefProvider.init(ClusterActorRefProvider.scala:58)
        at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:618)
        at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:615)
        at akka.actor.ActorSystemImpl._start(ActorSystem.scala:615)
        at akka.actor.ActorSystemImpl.start(ActorSystem.scala:632)
        at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
        at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)

Investigation

The timeout happen in class Cluster, when trying to GetClusterCoreRef

  private[cluster] val clusterCore: ActorRef = {
    implicit val timeout = system.settings.CreationTimeout
    try {
      Await.result((clusterDaemons ? InternalClusterAction.GetClusterCoreRef).mapTo[ActorRef], timeout.duration)
    } catch {

that lead to

private[cluster] final class ClusterCoreSupervisor extends Actor with ActorLogging
  with RequiresMessageQueue[UnboundedMessageQueueSemantics] {

...
val coreDaemon = context.watch(context.actorOf(Props(classOf[ClusterCoreDaemon], publisher).
    withDispatcher(context.props.dispatcher), name = "daemon"))
...
def receive = {
    case InternalClusterAction.GetClusterCoreRef ⇒ sender() ! coreDaemon
  }
...
}

ClusterCoreSupervisor will reply ClusterCoreDaemon

The problem is that ClusterCoreDaemon also requires the Cluster extension is initialized.
Check the defiition of ClusterCoreDaemon:

private[cluster] class ClusterCoreDaemon(publisher: ActorRef) extends Actor with ActorLogging
  with RequiresMessageQueue[UnboundedMessageQueueSemantics] {
  import InternalClusterAction._

  val cluster = Cluster(context.system)

val cluster = Cluster(context.system) will call system.registerExtension, which is where the exception stack is waiting for.

update on 4/22

The updated analysis is on comment #17253 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions