Skip to content

Akka.Remote - proper logging when remote node abruptly goes down #2460

@maxcherednik

Description

@maxcherednik

When 2 nodes are Associated successfully and one of the nodes goes down abruptly we get this:

2017-01-19 17:01:13.400 [1] [(null)] INFO  Akka.RiskEngine.RiskEngineService - Starting risk engine...
2017-01-19 17:01:13.724 [8] [(null)] INFO  Akka.Event.DummyClassForStringSources - Starting remoting
2017-01-19 17:01:13.830 [5] [(null)] INFO  Akka.Event.DummyClassForStringSources - Remoting started; listening on addresses : [akka.tcp://riskengine@127.0.0.1:4054]
2017-01-19 17:01:13.832 [7] [(null)] INFO  Akka.Event.DummyClassForStringSources - Remoting now listens on addresses: [akka.tcp://riskengine@127.0.0.1:4054]
2017-01-19 17:01:13.863 [8] [(null)] INFO  Akka.Event.DummyClassForStringSources - Cluster Node [akka.tcp://riskengine@127.0.0.1:4054] - Starting up...
2017-01-19 17:01:13.881 [19] [(null)] INFO  Akka.Event.DummyClassForStringSources - Cluster Node [akka.tcp://riskengine@127.0.0.1:4054] - Started up successfully
2017-01-19 17:01:13.887 [1] [(null)] INFO  Akka.RiskEngine.RiskEngineService - Started!
2017-01-19 17:01:14.893 [22] [(null)] INFO  Akka.Cluster.ClusterCoreDaemon - Welcome from [akka.tcp://riskengine@127.0.0.1:4053]
2017-01-19 17:02:39.191 [8] [(null)] ERROR Akka.Remote.Transport.Helios.TcpServerHandler - Error caught channel [[::ffff:127.0.0.1]:4054->[::ffff:127.0.0.1]:60957](Id=ChannelId(-717517600))
System.Net.Sockets.SocketException (0x80004005): An existing connection was forcibly closed by the remote host
   at Helios.Channels.Sockets.SocketChannelAsyncOperation.Validate()
   at Helios.Channels.Sockets.AbstractSocketByteChannel.SocketByteChannelUnsafe.FinishRead(SocketChannelAsyncOperation operation)
2017-01-19 17:02:39.195 [8] [(null)] ERROR Akka.Remote.Transport.Helios.TcpClientHandler - Error caught channel [[::ffff:127.0.0.1]:60956->[::ffff:127.0.0.1]:4053](Id=ChannelId(1909747648))
System.Net.Sockets.SocketException (0x80004005): An existing connection was forcibly closed by the remote host
   at Helios.Channels.Sockets.SocketChannelAsyncOperation.Validate()
   at Helios.Channels.Sockets.AbstractSocketByteChannel.SocketByteChannelUnsafe.FinishRead(SocketChannelAsyncOperation operation)
2017-01-19 17:02:39.199 [8] [(null)] ERROR Akka.Remote.Transport.Helios.TcpClientHandler - Error caught channel [[::ffff:127.0.0.1]:60956->[::ffff:127.0.0.1]:4053](Id=ChannelId(1909747648))
System.Net.Sockets.SocketException (0x80004005): An existing connection was forcibly closed by the remote host
   at Helios.Channels.Sockets.SocketChannelAsyncOperation.Validate()
   at Helios.Channels.Sockets.AbstractSocketByteChannel.SocketByteChannelUnsafe.FinishRead(SocketChannelAsyncOperation operation)
2017-01-19 17:02:39.199 [8] [(null)] ERROR Akka.Remote.Transport.Helios.TcpServerHandler - Error caught channel [[::ffff:127.0.0.1]:4054->[::ffff:127.0.0.1]:60957](Id=ChannelId(-717517600))
System.Net.Sockets.SocketException (0x80004005): An existing connection was forcibly closed by the remote host
   at Helios.Channels.Sockets.SocketChannelAsyncOperation.Validate()
   at Helios.Channels.Sockets.AbstractSocketByteChannel.SocketByteChannelUnsafe.FinishRead(SocketChannelAsyncOperation operation)
2017-01-19 17:02:39.213 [22] [(null)] WARN  Akka.Remote.ReliableDeliverySupervisor - Association with remote system akka.tcp://riskengine@127.0.0.1:4053 has failed; address is now gated for 5000 ms. Reason is: [Akka.Remote.EndpointDisassociatedException: Disassociated
   at Akka.Remote.EndpointWriter.PublishAndThrow(Exception reason, LogLevel level, Boolean needToThrow)
   at Akka.Actor.ReceiveActor.ExecutePartialMessageHandler(Object message, PartialAction`1 partialAction)
   at Akka.Actor.ActorCell.<>c__DisplayClass114_0.<Akka.Actor.IUntypedActorContext.Become>b__0(Object m)
   at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
   at Akka.Actor.ActorCell.ReceiveMessage(Object message)
   at Akka.Actor.ActorCell.AutoReceiveMessage(Envelope envelope)
   at Akka.Actor.ActorCell.Invoke(Envelope envelope)
--- End of stack trace from previous location where exception was thrown ---
   at Akka.Actor.ActorCell.HandleFailed(Failed f)
   at Akka.Actor.ActorCell.SysMsgInvokeAll(EarliestFirstSystemMessageList messages, Int32 currentState)]

Proposition:
On the transport level - no need to print the error and the stack trace - it's totally normal that sometimes socket can do this:

System.Net.Sockets.SocketException (0x80004005): An existing connection was forcibly closed by the remote host

We can propagate this information to the upper level that node is not available because of this: An existing connection was forcibly closed by the remote host

Hence on the higher level we don't need to print a strange exception:

2017-01-19 17:02:39.213 [22] [(null)] WARN  Akka.Remote.ReliableDeliverySupervisor - Association with remote system akka.tcp://riskengine@127.0.0.1:4053 has failed; address is now gated for 5000 ms. Reason is: [Akka.Remote.EndpointDisassociatedException: Disassociated
   at Akka.Remote.EndpointWriter.PublishAndThrow(Exception reason, LogLevel level, Boolean needToThrow)
   at Akka.Actor.ReceiveActor.ExecutePartialMessageHandler(Object message, PartialAction`1 partialAction)
   at Akka.Actor.ActorCell.<>c__DisplayClass114_0.<Akka.Actor.IUntypedActorContext.Become>b__0(Object m)
   at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
   at Akka.Actor.ActorCell.ReceiveMessage(Object message)
   at Akka.Actor.ActorCell.AutoReceiveMessage(Envelope envelope)
   at Akka.Actor.ActorCell.Invoke(Envelope envelope)
--- End of stack trace from previous location where exception was thrown ---
   at Akka.Actor.ActorCell.HandleFailed(Failed f)
   at Akka.Actor.ActorCell.SysMsgInvokeAll(EarliestFirstSystemMessageList messages, Int32 currentState)]

... cause this is not what happened in reality.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions