Skip to content

Conversation

Aaronontheweb
Copy link
Member

@Aaronontheweb Aaronontheweb commented Jun 2, 2017

I've looked through some of these and the reasons for the raciness in most of these specs are due to issues where sockets aren't being cleaned up quickly enough or the timeouts are too narrowly defined. There might be some legitimate bugs with the specs here too, so I'm going to go through each one and submit some patches for each in the name of making our build process run more smoothly.

Current list:

  • Akka.Cluster.Tests.ActorProviderConfig_should_resolve_remote_alias
  • Akka.Cluster.Tests.ActorProviderConfig_should_resolve_cluster_alias
  • Akka.Cluster.Tests.ClusterSpec.A_cluster_must_cancel_LeaveAsync_task_if_CancellationToken_fired_before_node_left
  • Akka.Cluster.Tests.DowningProviderSpec.Downing_provider_should_stop_the_cluster_if_the_downing_provider_throws_exception_in_props
  • Akka.Cluster.Tests.ClusterSpec.A_cluster_must_return_completed_LeaveAsync_task_if_member_already_removed
  • Akka.Tests.Routing.TailChoppingSpec.Tail_chopping_group_router_must_throw_exception_if_no_result_will_arrive_within_the_given_time

The common thread with most of these issues is that they use the real socket system underneath Akka.Remote, so I suspect it's a problem with how that's being used / not giving it enough time.

I'm also going to see if these failures might be related to #2575 - which I was planning on fixing anyway. Stay tuned.

@@ -40,11 +40,13 @@ public void ActorProviderConfig_should_resolve_cluster_alias()

private void ConfigureAndVerify(string alias, Type actorProviderType)
{
var config = ConfigurationFactory.ParseString(@"akka.actor.provider = " + alias);
var config = ConfigurationFactory.ParseString(@"akka.actor.provider = " + alias)
.WithFallback(ConfigurationFactory.ParseString("akka.remote.dot-netty.tcp.port = 0")); // use a random port to avoid issues with async and parallelization
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved via port 0 binding (won't have any side-effects on using a local ActorSystem since the configuration never gets read.)

using (var system = ActorSystem.Create(nameof(ActorRefProvidersConfigSpec), config))
{
var ext = (ExtendedActorSystem) system;
ext.Provider.GetType().ShouldBe(actorProviderType);
system.Terminate().Wait(TimeSpan.FromSeconds(3)); // force the system to cleanup and shutdown
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, Dispose on an ActorSystem should probably call this and block until it's had a chance to run part of its shutdown process.

Copy link
Member Author

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More fixes to some of the cluster specs

@@ -20,6 +20,12 @@ namespace Akka.Cluster.Tests
{
public class ClusterSpec : AkkaSpec
{

/*
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added portability note explaining some of the semantic differences with the specs.

@@ -257,7 +261,7 @@ public void A_cluster_must_cancel_LeaveAsync_task_if_CancellationToken_fired_bef

// Subsequent LeaveAsync() tasks expected to complete immediately (not cancelled)
var task3 = _cluster.LeaveAsync();
task3.Should(t => t.IsCompleted && !t.IsCanceled, "Task should be completed, but not cancelled.");
AwaitCondition(() => task3.IsCompleted && !task3.IsCanceled, null, "Task should be completed, but not cancelled.");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to be await-ed

@@ -214,7 +218,7 @@ public void A_cluster_must_return_completed_LeaveAsync_task_if_member_already_re
});

// LeaveAsync() task expected to complete immediately
_cluster.LeaveAsync().IsCompleted.Should().BeTrue();
AwaitCondition(() => _cluster.LeaveAsync().IsCompleted);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to be await-ed

@@ -237,7 +241,7 @@ public void A_cluster_must_cancel_LeaveAsync_task_if_CancellationToken_fired_bef

// Cancelling the first task
cts.Cancel();
task1.Should(t => t.IsCanceled, "Task should be cancelled.");
AwaitCondition(() => task1.IsCanceled, null, "Task should be cancelled");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to be await-ed

#else
[Fact(Skip = "Fails flakily on .NET 4.5")]
#endif
//#if CORECLR
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to see this spec pass again

@Aaronontheweb
Copy link
Member Author

Going to work on DowningProviderSpec.Downing_provider_should_stop_the_cluster_if_the_downing_provider_throws_exception_in_props next

@Aaronontheweb Aaronontheweb mentioned this pull request Jun 6, 2017
9 tasks
@Aaronontheweb Aaronontheweb changed the title [WIP] Cleaning up racy specs Cleaning up racy specs part1 Jun 6, 2017
@Aaronontheweb Aaronontheweb added this to the 1.3.0 milestone Jun 6, 2017
@Aaronontheweb
Copy link
Member Author

@akkadotnet/core ready for review.

@@ -59,7 +59,7 @@ public class DowningProviderSpec : AkkaSpec
loglevel = WARNING
actor.provider = ""Akka.Cluster.ClusterActorRefProvider, Akka.Cluster""
remote {
netty.tcp {
dot-netty.tcp {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@@ -127,8 +127,8 @@ public void Tail_chopping_group_router_must_return_response_from_second_actor_af
[Fact]
public void Tail_chopping_group_router_must_throw_exception_if_no_result_will_arrive_within_the_given_time()
{
var actor1 = Sys.ActorOf(Props.Create(() => new TailChopTestActor(500.Milliseconds())), "Actor3");
var actor2 = Sys.ActorOf(Props.Create(() => new TailChopTestActor(500.Milliseconds())), "Actor4");
var actor1 = Sys.ActorOf(Props.Create(() => new TailChopTestActor(1500.Milliseconds())), "Actor3");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should document all changes of timeout in the tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexvaluyskiy that's what commit histories are for - in this case the issue was just that the delay 500ms and the timeout 300ms were sufficiently close together that a busy system could receive the unwanted "ack" message before the delay. Nothing was technically wrong with the spec or the code it was testing otherwise.

@Aaronontheweb
Copy link
Member Author

Mentioned this already on #2728, but I'm splitting off some of the other racy specs into their own issue. Figured it'd be better to get some fixes in immediately rather than trying to do it all at once.

@alexvaluyskiy alexvaluyskiy merged commit 61e40ef into akkadotnet:v1.3 Jun 6, 2017
@Aaronontheweb Aaronontheweb deleted the fix-racy-specs branch June 6, 2017 14:57
Aaronontheweb added a commit to Aaronontheweb/akka.net that referenced this pull request Jun 28, 2017
* cleaning up racy spec for ActorRefProvider resolution

* fixed LeaveAsync cancellation task specs

* fixed downingproviderspec config

* fixed TailChoppingSpec - increase the sleep time to make timeouts less likely on busy systems

* added another necessary awaitAssert to ClusterSpec.ClusterSpec.A_cluster_must_cancel_LeaveAsync_task_if_CancellationToken_fired_before_node_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants