Skip to content

Conversation

Aaronontheweb
Copy link
Member

Closes #2288

@@ -517,12 +517,13 @@ namespace Akka.Actor
public static Akka.Actor.DeployableDecider From(Akka.Actor.Directive defaultDirective, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<System.Type, Akka.Actor.Directive>> pairs) { }
public static Akka.Actor.LocalOnlyDecider From(System.Func<System.Exception, Akka.Actor.Directive> localOnlyDecider) { }
}
public class DedicatedThreadScheduler : Akka.Actor.SchedulerBase, Akka.Actor.IDateTimeOffsetNowTimeProvider, Akka.Actor.ITimeProvider
public class DedicatedThreadScheduler : Akka.Actor.SchedulerBase, Akka.Actor.IDateTimeOffsetNowTimeProvider, Akka.Actor.ITimeProvider, System.IDisposable
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Public API change: DedicatedThreadScheduler now implements IDisposable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change the version to 1.2.0 if we have API breaking changes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, this is an internal component which can be extended but we don't know of any implementations out in the wild. Plus the old API is frankly kind of dangerous (as I explained in my comment below.)

What I've decided to do was change the SchedulerBase constructor that supports the new Activator.CreateInstance arguments , but I've left the old DedicatedThreadScheduler constructor intact in case anyone was instantiating one of those for some reason. I figured that was a reasonable compromise.

@Aaronontheweb
Copy link
Member Author

Aaronontheweb commented Sep 6, 2016

TODOS:

  • need to guarantee that repeating scheduled tasks can't deadlock the termination process.
  • need to update dispatchers to use the updated SchedulerException.
  • need to guarantee that an exception while performing scheduled work doesn't disable the IScheduler

@Aaronontheweb Aaronontheweb changed the title close #2288 - verified that all IScheduler implementations which support IDisposable will be disposed before ActorSystem.WhenTerminated completes. [WIP] close #2288 - verified that all IScheduler implementations which support IDisposable will be disposed before ActorSystem.WhenTerminated completes. Sep 6, 2016
@Aaronontheweb
Copy link
Member Author

Benchmark Results

Akka.Tests.Performance.Actor.Scheduler.DefaultSchedulerPerformanceTests+SchedulerThroughputStressTest

Tests to see how many concurrent jobs we can schedule and what the effects are on throughput
9/7/2016 12:43:51 AM

System Info

NBench=NBench, Version=0.3.0.0, Culture=neutral, PublicKeyToken=null
OS=Microsoft Windows NT 6.2.9200.0
ProcessorCount=2
CLR=4.0.30319.42000,IsMono=False,MaxGcGeneration=2
WorkerThreads=32767, IOThreads=2

NBench Settings

RunMode=Iterations, TestMode=Measurement
NumberOfIterations=1, MaximumRunTime=00:00:01
Concurrent=True
Tracing=True

Data


Totals

Metric Units Max Average Min StdDev
TotalBytesAllocated bytes 277,380,432.00 277,380,432.00 277,380,432.00 0.00
TotalCollections [Gen0] collections 432.00 432.00 432.00 0.00
TotalCollections [Gen1] collections 194.00 194.00 194.00 0.00
TotalCollections [Gen2] collections 56.00 56.00 56.00 0.00
[Counter] ScheduledJobs operations 889,600.00 889,600.00 889,600.00 0.00
[Counter] ScheduleInvokes operations 41,414,684.00 41,414,684.00 41,414,684.00 0.00

Per-second Totals

Metric Units / s Max / s Average / s Min / s StdDev / s
TotalBytesAllocated bytes 13,207,660.29 13,207,660.29 13,207,660.29 0.00
TotalCollections [Gen0] collections 20.57 20.57 20.57 0.00
TotalCollections [Gen1] collections 9.24 9.24 9.24 0.00
TotalCollections [Gen2] collections 2.67 2.67 2.67 0.00
[Counter] ScheduledJobs operations 42,358.92 42,358.92 42,358.92 0.00
[Counter] ScheduleInvokes operations 1,971,988.70 1,971,988.70 1,971,988.70 0.00

Raw Data

TotalBytesAllocated

Run # bytes bytes / s ns / bytes
1 277,380,432.00 13,207,660.29 75.71

TotalCollections [Gen0]

Run # collections collections / s ns / collections
1 432.00 20.57 48,614,540.28

TotalCollections [Gen1]

Run # collections collections / s ns / collections
1 194.00 9.24 108,255,058.76

TotalCollections [Gen2]

Run # collections collections / s ns / collections
1 56.00 2.67 375,026,453.57

[Counter] ScheduledJobs

Run # operations operations / s ns / operations
1 889,600.00 42,358.92 23,607.78

[Counter] ScheduleInvokes

Run # operations operations / s ns / operations
1 41,414,684.00 1,971,988.70 507.10

@Aaronontheweb
Copy link
Member Author

Biggest surprise to me on this benchmark isn't the speed (roughly 2 million msg /s) but the garbage collection. Feels like this code is blowing through a lot more gen 2 GC than it should.... And an incredible amount of memory allocation (277mb.)

The number of schedule attempts per second is pretty much fixed by the way I wrote the benchmark; not terribly important here to measure maximum speed IMHO. Rather to see if the changes we make result in an improvement or not.

@Aaronontheweb
Copy link
Member Author

Implemented first cut of HashedWheelTimerScheduler - waiting for benchmark data. A little worried that it's sleep detector isn't sensitive enough, but it's tough to tell when debugging. Probably going to need to go over it with a profiler. Passed all of the scheduler specs save one though (since this is a behavior I have not yet implemented)

@Aaronontheweb
Copy link
Member Author

HashedWheelTimerScheduler Benchmark 1

Akka.Tests.Performance.Actor.Scheduler.DefaultSchedulerPerformanceTests+SchedulerThroughputStressTest

Tests to see how many concurrent jobs we can schedule and what the effects are on throughput
9/7/2016 9:58:44 PM

System Info

NBench=NBench, Version=0.3.0.0, Culture=neutral, PublicKeyToken=null
OS=Microsoft Windows NT 6.2.9200.0
ProcessorCount=2
CLR=4.0.30319.42000,IsMono=False,MaxGcGeneration=2
WorkerThreads=32767, IOThreads=2

NBench Settings

RunMode=Iterations, TestMode=Measurement
NumberOfIterations=1, MaximumRunTime=00:00:01
Concurrent=True
Tracing=True

Data


Totals

Metric Units Max Average Min StdDev
TotalBytesAllocated bytes 100,706,168.00 100,706,168.00 100,706,168.00 0.00
TotalCollections [Gen0] collections 961.00 961.00 961.00 0.00
TotalCollections [Gen1] collections 202.00 202.00 202.00 0.00
TotalCollections [Gen2] collections 7.00 7.00 7.00 0.00
[Counter] ScheduledJobs operations 856,400.00 856,400.00 856,400.00 0.00
[Counter] ScheduleInvokes operations 38,603,899.00 38,603,899.00 38,603,899.00 0.00

Per-second Totals

Metric Units / s Max / s Average / s Min / s StdDev / s
TotalBytesAllocated bytes 5,035,062.97 5,035,062.97 5,035,062.97 0.00
TotalCollections [Gen0] collections 48.05 48.05 48.05 0.00
TotalCollections [Gen1] collections 10.10 10.10 10.10 0.00
TotalCollections [Gen2] collections 0.35 0.35 0.35 0.00
[Counter] ScheduledJobs operations 42,817.91 42,817.91 42,817.91 0.00
[Counter] ScheduleInvokes operations 1,930,100.87 1,930,100.87 1,930,100.87 0.00

Raw Data

TotalBytesAllocated

Run # bytes bytes / s ns / bytes
1 100,706,168.00 5,035,062.97 198.61

TotalCollections [Gen0]

Run # collections collections / s ns / collections
1 961.00 48.05 20,812,668.99

TotalCollections [Gen1]

Run # collections collections / s ns / collections
1 202.00 10.10 99,014,727.23

TotalCollections [Gen2]

Run # collections collections / s ns / collections
1 7.00 0.35 2,857,282,128.57

[Counter] ScheduledJobs

Run # operations operations / s ns / operations
1 856,400.00 42,817.91 23,354.71

[Counter] ScheduleInvokes

Run # operations operations / s ns / operations
1 38,603,899.00 1,930,100.87 518.11

@Aaronontheweb
Copy link
Member Author

Big drop in allocations (277mb --> 100 mb) and GEN2 GC.
(56 --> 7).

Slight increase in scheduling throughput, which I think is the result of decreased contention around the ConcurrentQueue this design uses. 42,358.92 schedules per second --> 42,817.91.

Bit of a drop in actual scheduled calls per second, which could be the result of the fact that we're now scheduling tasks to be executed on the ThreadPool instead of doing it directly on the scheduler thread (adds a delay to executing the scheduled operation.) Let me play around with that. 1,971,988.70 --> 1,930,100.87.

@Aaronontheweb
Copy link
Member Author

Looking at the MNTR logs, seems like there's still some issues with my implementation:

[21:45:13]  [Step 2/2] 
[21:45:13]  [Step 2/2] Unhandled Exception: System.InvalidOperationException: SchedulerRegistration.Deadline [200113356] > Timer.Deadline [146742809]
[21:45:13]  [Step 2/2]    at Akka.Actor.HashedWheelTimerScheduler.Bucket.Execute(Int64 deadline) in D:\BuildAgent\work\d26c9d7f36545acd\src\core\Akka\Actor\Scheduler\HashedWheelTimerScheduler.cs:line 574
[21:45:13]  [Step 2/2]    at Akka.Actor.HashedWheelTimerScheduler.Run() in D:\BuildAgent\work\d26c9d7f36545acd\src\core\Akka\Actor\Scheduler\HashedWheelTimerScheduler.cs:line 165
[21:45:13]  [Step 2/2]    at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
[21:45:13]  [Step 2/2]    at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
[21:45:13]  [Step 2/2]    at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
[21:45:13]  [Step 2/2]    at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
[21:45:13]  [Step 2/2]    at System.Threading.ThreadHelper.ThreadStart()
[21:45:13]  [Step 2/2] 
[21:45:13]  [Step 2/2] Unhandled Exception: System.InvalidOperationException: SchedulerRegistration.Deadline [200112555] > Timer.Deadline [146803128]
[21:45:13]  [Step 2/2]    at Akka.Actor.HashedWheelTimerScheduler.Bucket.Execute(Int64 deadline) in D:\BuildAgent\work\d26c9d7f36545acd\src\core\Akka\Actor\Scheduler\HashedWheelTimerScheduler.cs:line 574
[21:45:13]  [Step 2/2]    at Akka.Actor.HashedWheelTimerScheduler.Run() in D:\BuildAgent\work\d26c9d7f36545acd\src\core\Akka\Actor\Scheduler\HashedWheelTimerScheduler.cs:line 165
[21:45:13]  [Step 2/2]    at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
[21:45:13]  [Step 2/2]    at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
[21:45:13]  [Step 2/2]    at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
[21:45:13]  [Step 2/2]    at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
[21:45:13]  [Step 2/2]    at System.Threading.ThreadHelper.ThreadStart()

@Aaronontheweb
Copy link
Member Author

HashedWheelTimerScheduler Benchmark 2

Akka.Tests.Performance.Actor.Scheduler.DefaultSchedulerPerformanceTests+SchedulerThroughputStressTest

Tests to see how many concurrent jobs we can schedule and what the effects are on throughput
9/8/2016 12:25:32 AM

System Info

NBench=NBench, Version=0.3.0.0, Culture=neutral, PublicKeyToken=null
OS=Microsoft Windows NT 6.2.9200.0
ProcessorCount=2
CLR=4.0.30319.42000,IsMono=False,MaxGcGeneration=2
WorkerThreads=32767, IOThreads=2

NBench Settings

RunMode=Iterations, TestMode=Measurement
NumberOfIterations=1, MaximumRunTime=00:00:01
Concurrent=True
Tracing=True

Data


Totals

Metric Units Max Average Min StdDev
TotalBytesAllocated bytes 111,339,936.00 111,339,936.00 111,339,936.00 0.00
TotalCollections [Gen0] collections 20.00 20.00 20.00 0.00
TotalCollections [Gen1] collections 7.00 7.00 7.00 0.00
TotalCollections [Gen2] collections 1.00 1.00 1.00 0.00
[Counter] ScheduledJobs operations 964,800.00 964,800.00 964,800.00 0.00
[Counter] ScheduleInvokes operations 79,906,737.00 79,906,737.00 79,906,737.00 0.00

Per-second Totals

Metric Units / s Max / s Average / s Min / s StdDev / s
TotalBytesAllocated bytes 5,566,713.06 5,566,713.06 5,566,713.06 0.00
TotalCollections [Gen0] collections 1.00 1.00 1.00 0.00
TotalCollections [Gen1] collections 0.35 0.35 0.35 0.00
TotalCollections [Gen2] collections 0.05 0.05 0.05 0.00
[Counter] ScheduledJobs operations 48,237.54 48,237.54 48,237.54 0.00
[Counter] ScheduleInvokes operations 3,995,133.22 3,995,133.22 3,995,133.22 0.00

Raw Data

TotalBytesAllocated

Run # bytes bytes / s ns / bytes
1 111,339,936.00 5,566,713.06 179.64

TotalCollections [Gen0]

Run # collections collections / s ns / collections
1 20.00 1.00 1,000,050,970.00

TotalCollections [Gen1]

Run # collections collections / s ns / collections
1 7.00 0.35 2,857,288,485.71

TotalCollections [Gen2]

Run # collections collections / s ns / collections
1 1.00 0.05 20,001,019,400.00

[Counter] ScheduledJobs

Run # operations operations / s ns / operations
1 964,800.00 48,237.54 20,730.74

[Counter] ScheduleInvokes

Run # operations operations / s ns / operations
1 79,906,737.00 3,995,133.22 250.30

@Aaronontheweb
Copy link
Member Author

I've run the benchmark 3-4 times now and this is the lowest recorded result; wanted to sanity check the figures. Biggest change I made? All scheduled work happens on the scheduler thread instead of being executed as a new Task on the ThreadPool.

As long as the scheduled operations don't run for very long (most don't) this design achieves significantly higher throughput with a fraction of the allocations and garbage collections. Memory consumption is about the same: it's determined largely by the number of scheduled jobs, and many more jobs are queued in this design.

Throughput

This spec always runs for 20 seconds. This implementation has fundamentally the same contention overhead as the previous HashedWheelTimerScheduler, but this implementation is able to improve upon 42,817.91 schedules added per second all the way up to 48,237.54 per second. At 12-13% increase. Moving away from the ThreadPool wouldn't affect that directly, but decreasing the amount of garbage collection attempts would.

In in this benchmark we also improve upon total scheduled actions executed throughput from the previous HashedWheelTimerScheduler implementation from 1,930,100.87 to 3,995,133.22 per second. More than double the performance.

I think this is because:

  1. We're executing the work synchronously rather than scheduling it onto the ThreadPool, which makes the timing of when the scheduled work is performed much more accurate.
  2. The previous increase in the rate of scheduling is going to have a compound effect on scheduled operations executed, given that all of these events are repeating events at 20ms intervals. So there's just much more work being scheduled.
  3. The sleep algorithm is dynamic in the HashedWheelTimer - if we had a lot of work to execute in one bucket (position on the wheel) we may not sleep the worker thread at all and just proceed directly to the next bucket immediately. Sleep only occurs if there's at least 1 millisecond of time between one bucket finishing and another starting. This means that as the scheduler's load increases, the amount of time it spends sleeping decreases or becomes more spread out over longer intervals of activity.

I'm going to remove the Task code that I commented out in this benchmark as I think this is just more efficient and accurate over delegating execution of scheduled work to the ThreadPool.

Memory

Memory allocation didn't noticeably change at all - consumption increased slightly due to the increase in work scheduled. I'd have to run a profiler on both versions to understand why, but my back-of-the-envelope math works out to the fact that the Tasks we were allocating earlier were using a static delegate with an argument, therefore aside from the Task object itself we weren't allocating much memory in terms of total bytes although we were allocating significantly more objects.

GC

The biggest difference between this design and the other two (the first one being the original DedicatedThreadScheduler) is that this one has virtually no garbage collection overhead compared to the others.

Each item here is for all three implementations in order:

  • GEN2: 56 --> 7 --> 1 garbage collections over 20 seconds
  • GEN1: 194 -->202 --> 7 garbage collections over 20 seconds
  • GEN0: 432 --> 961 --> 20 garbage collections over 20 seconds

The original DedicatedThreadScheduler is pretty heavy on GC across all generations; the first draft of the HashedWheelTimerScheduler is much more efficient on GEN2, but extremely heavy on the earlier GC generations. Eliminating the use of the Task in the second draft of the HashedWheelTimerScheduler resulted in drastic drops in GC pressure everywhere. In addition to making the timer more precise it also allocates far fewer objects - a recurring task takes exactly three allocations for the duration of its lifetime (SchedulerRegistration, IRunnable, ICancelable) and those objects will be re-used indefinitely as long as the ICancelable is not invoked. There will be some GC pressure for "one shot" tasks, but we make that pretty easy for the garbage collector by stripping all of the references to it out of our linked lists.

Conclusion

The HashedWheelTimer looks like the way to go. I already made it the default and the entire test suite is running on it, including the MNTR. It might merit some additional profiling to ensure that it's not being wasteful with its sleep / wake routine but other than that I think it's a demonstrable improvement. Going to go try it out in conjunction with the Mono specs.

@Aaronontheweb Aaronontheweb force-pushed the fix-2288-scheduler-impl-rebase branch 2 times, most recently from 88aa0a7 to a7870f4 Compare September 8, 2016 03:31
@Aaronontheweb Aaronontheweb added this to the 1.1.2 milestone Sep 8, 2016
@Aaronontheweb Aaronontheweb changed the title [WIP] close #2288 - verified that all IScheduler implementations which support IDisposable will be disposed before ActorSystem.WhenTerminated completes. HashedWheelTimerScheduler and IScheduler disposal guarantees Sep 8, 2016
@Aaronontheweb
Copy link
Member Author

@akkadotnet/developers this is now ready to be reviewed once CI has a chance to run.

@Aaronontheweb Aaronontheweb force-pushed the fix-2288-scheduler-impl-rebase branch from a7870f4 to 604b667 Compare September 8, 2016 15:42
public class DefaultSchedulerPerformanceTests
{
private ActorSystem _actorSystem;
public const string ScheduledOps = "ScheduleInvokes";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need these public consts? Could we make them as private?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a benchmark. Doesn't really matter.

@alexvaluyskiy
Copy link
Contributor

LGTM, may I merge it?

@Aaronontheweb
Copy link
Member Author

@alexvaluyskiy I'm going to implement some of your remarks and do a bit of cleanup first.

@@ -0,0 +1,670 @@
using System;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright header ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh! yep, will add that

…s which support `IDisposable` will be disposed before `ActorSystem.WhenTerminated` completes.

close akkadotnet#1593 - significantly improved upon the `DedicatedThreadScheduler` performance

Signed-off-by: Aaron Stannard <aaron@petabridge.com>
@Aaronontheweb
Copy link
Member Author

Addressed all remarks, approved the API, etc... this should be ready to go now.

@Horusiath Horusiath merged commit d86eb6e into akkadotnet:dev Sep 13, 2016
@Aaronontheweb Aaronontheweb deleted the fix-2288-scheduler-impl-rebase branch September 13, 2016 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants