Skip to content

Tests Fail with Timeout - System.Net.Sockets.SocketException (61): Connection refused #106

@edubsi

Description

@edubsi

Hi,

we noticed our builds to fail from time to time due to timeout exception on our GitLab runners.
The weird thing is that we do not see any correlation to specific runners or servers causing problems but the same runner would succeed three times and the next time it fails.

I was trying to reproduce this error on my machine (macbook) and found that the tests always succeed when running from my IDE (Rider) but they will fail if I run "dotnet test" from the console.

We found that our tests have a significant overhead if we create a runner for every test as recommended by the docs.
So for our case a Fixture worked great in terms of Unit Test speed but is now causing the trouble if multiple test projects contain the fixture.

public class MongoDbFixture : IDisposable
{
    public MongoDbFixture()
    {
        // Workaround to fix the Problem :shrug:
        //using (MongoDbRunner.Start()) { }
        mongoRunner = MongoDbRunner.Start();
    }

    private readonly MongoDbRunner mongoRunner;
    
    public void ClearAll()
    {
      // clean db before each test
    }

    public void Dispose()
    {
        mongoRunner?.Dispose();
    }
}

The issue we run into is the following and we see this pattern for every failing test. The address in use won't show up in the successful runs.

...
2020-11-13T11:20:34.585+0100 E STORAGE  [initandlisten] Failed to set up listener: SocketException: Address already in use
...
[xUnit.net 00:00:36.26]     ... ShouldPassPublishExceptions [FAIL]
  Failed  ... ShouldPassPublishExceptions [1 ms]
  Error Message:
   System.TimeoutException : A timeout occured after 30000ms selecting a server using CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 } }. Client view of cluster state is { ClusterId : "1", ConnectionMode : "Automatic", Type : "Unknown", State : "Disconnected", Servers : [{ ServerId: "{ ClusterId : 1, EndPoint : "127.0.0.1:27018" }", EndPoint: "127.0.0.1:27018", ReasonChanged: "Heartbeat", State: "Disconnected", ServerVersion: , TopologyVersion: , Type: "Unknown", HeartbeatException: "MongoDB.Driver.MongoConnectionException: An exception occurred while opening a connection to the server.
 ---> System.Net.Sockets.SocketException (61): Connection refused
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
   at System.Threading.Tasks.ValueTask.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state)
--- End of stack trace from previous location ---
   at MongoDB.Driver.Core.Connections.TcpStreamFactory.ConnectAsync(Socket socket, EndPoint endPoint, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Connections.TcpStreamFactory.CreateStreamAsync(EndPoint endPoint, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Connections.BinaryConnection.OpenHelperAsync(CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at MongoDB.Driver.Core.Connections.BinaryConnection.OpenHelperAsync(CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Servers.ServerMonitor.InitializeConnectionAsync(CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Servers.ServerMonitor.HeartbeatAsync(CancellationToken cancellationToken)", LastHeartbeatTimestamp: "2020-11-13T10:21:08.3182250Z", LastUpdateTimestamp: "2020-11-13T10:21:08.3182250Z" }] }.
  Stack Trace:
     at MongoDB.Driver.Core.Clusters.Cluster.ThrowTimeoutException(IServerSelector selector, ClusterDescription description)
   at MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChangedHelper.HandleCompletedTask(Task completedTask)
   at MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChanged(IServerSelector selector, ClusterDescription description, Task descriptionChangedTask, TimeSpan timeout, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Clusters.Cluster.SelectServer(IServerSelector selector, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoClient.AreSessionsSupportedAfterServerSelection(CancellationToken cancellationToken)
   at MongoDB.Driver.MongoClient.AreSessionsSupported(CancellationToken cancellationToken)
   at MongoDB.Driver.MongoClient.StartImplicitSession(CancellationToken cancellationToken)
   at MongoDB.Driver.OperationExecutor.StartImplicitSession(CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionImpl`1.UsingImplicitSession[TResult](Func`2 func, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionImpl`1.BulkWrite(IEnumerable`1 requests, BulkWriteOptions options, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionBase`1.<>c__DisplayClass31_0.<DeleteMany>b__0(IEnumerable`1 requests)
   at MongoDB.Driver.MongoCollectionBase`1.DeleteMany(FilterDefinition`1 filter, DeleteOptions options, Func`2 bulkWriteFunc)
   at MongoDB.Driver.MongoCollectionBase`1.DeleteMany(FilterDefinition`1 filter, DeleteOptions options, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionBase`1.DeleteMany(FilterDefinition`1 filter, CancellationToken cancellationToken)
   ...

For me it looks like the check "is the port is already in use" fails if multiple projects contains the fixture. Not sure why (maybe the dispose pattern is implemented wrong?).
I tried to fix the issue and following things kinda "worked"

  • using (MongoDbRunner.Start()) { } // before starting the actual runner. Not sure why it fixed it but it does (for us)
  • use MongoDbRunner.StartUnitTest() and pass an IPortPool implementation using a random port (27020-27999)
    • but its marked as obsolete

So I am wondering if the "IsPortAvailable" check is working correctly and we run into a timing issue. Do both tests start at the exact same time and try to access the same port?
It would help us to fix the issue if we could decide which Port the runner tries to use, maybe allowing to set the "_startPort" or even pass a custom IPortPool implementation. A "start to probe at random Port" option would do it too.
Edit we use the latest version 2.2.14 with dotnet core 5.0. We saw the same problem on dotnet core 3.1 as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions