`FastGuid.NewGuid()` helper #1414

SergeiPavlov · 2024-03-15T23:13:48Z

It is well-known that standard Guid.NewGuid() is slow.
And even slower on Linux dotnet/runtime#13628

If GUIDs are not published outside of process it is enough to use global 64-bit counter as part of GUID generator.

alistairjevans · 2024-03-15T23:18:03Z

Thanks, interesting suggestion; can you please run our benchmarks and post before and after results here?

codecov · 2024-03-15T23:28:22Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.54%. Comparing base (b843130) to head (fed8a76).
Report is 6 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1414      +/-   ##
===========================================
+ Coverage    78.48%   78.54%   +0.06%     
===========================================
  Files          200      201       +1     
  Lines         5703     5720      +17     
  Branches      1168     1168              
===========================================
+ Hits          4476     4493      +17     
  Misses         714      714              
  Partials       513      513

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

botinko · 2024-03-16T00:32:18Z

Benchmark:

| Method  | Mean      | Error     | StdDev    |
|-------- |----------:|----------:|----------:|
| Default | 34.354 ns | 0.2249 ns | 0.2104 ns |
| Fast    |  8.337 ns | 0.0624 ns | 0.0553 ns |

    [Benchmark]
    public Guid Default() => Guid.NewGuid();

    [Benchmark]
    public Guid Fast() => FastGuid.NewGuid();

Tested on Windows, dotnet 8.
For different OS it could be different. In mentioned dotnet runtime issue they say ~900ns on Linux. On Linux Guid.NewGuid() uses /dev/urandom as a source for randomness. /dev/urandom serve as cryptographically secure pseudorandom number generators. They allow access to a CSPRNG that is seeded with entropy from environmental noise, collected from device drivers and other sources. It's known fact, that it have limited trougput, especially considering that in Kubernetes many containers share one physical machine and possible same /dev/urandom stream.

alistairjevans · 2024-03-16T07:03:49Z

Ok, can you compare the existing Autofac benchmarks as well, the ones in our "bench" folder (https://github.com/autofac/Autofac/tree/develop/bench/Autofac.Benchmarks)?

Want to understand the overall impact of the change on Autofac, based on how many guids we generate.

SergeiPavlov · 2024-03-18T14:15:24Z

What are specific Benchmarks which depends on .NewGuid() ?

I ran the entire set Before & After this change and don't see performance improvement.
It looks even worse, and I cannot explain why.

But the benchmarks mostly test Resolve*() methods, not registrations where GUID may be generated

Environment: x64, Linux
Example:
Before:

Method	repetitions	Mean	Error	StdDev	Median	Ratio	RatioSD	Gen0	Allocated	Alloc Ratio
Baseline	?	728.9 ns	0.46 ns	0.43 ns	728.7 ns	1.00	0.00	0.0324	2.67 KB	1.00

ResolveEnumerableT	1	2,665.4 ns	8.11 ns	7.58 ns	2,664.6 ns	?	?	0.0648	5.51 KB	?
ResolveT	1	1,415.2 ns	4.17 ns	3.69 ns	1,416.0 ns	?	?	0.0477	3.95 KB	?

ResolveEnumerableT	2	4,404.5 ns	12.41 ns	10.36 ns	4,405.7 ns	?	?	0.0992	8.34 KB	?
ResolveT	2	2,085.5 ns	2.84 ns	2.66 ns	2,085.3 ns	?	?	0.0610	5.22 KB	?

ResolveEnumerableT	3	6,237.9 ns	20.51 ns	16.01 ns	6,241.0 ns	?	?	0.1297	11.18 KB	?
ResolveT	3	2,700.5 ns	52.95 ns	70.69 ns	2,659.7 ns	?	?	0.0763	6.49 KB	?

After:

Method	repetitions	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Allocated	Alloc Ratio
Baseline	?	775.2 ns	7.70 ns	7.20 ns	1.00	0.00	0.0324	2.67 KB	1.00

ResolveEnumerableT	1	2,872.2 ns	21.73 ns	20.32 ns	?	?	0.0648	5.51 KB	?
ResolveT	1	1,502.1 ns	13.33 ns	11.82 ns	?	?	0.0477	3.95 KB	?

ResolveEnumerableT	2	4,889.6 ns	86.27 ns	76.47 ns	?	?	0.0992	8.34 KB	?
ResolveT	2	2,163.8 ns	20.72 ns	18.37 ns	?	?	0.0610	5.22 KB	?

ResolveEnumerableT	3	6,592.1 ns	26.22 ns	24.53 ns	?	?	0.1297	11.18 KB	?
ResolveT	3	2,781.0 ns	10.43 ns	8.71 ns	?	?	0.0763	6.49 KB	?

alistairjevans · 2024-03-18T14:29:42Z

Interesting that it's coming out worse...have you benchmarked dictionary lookups when Guids from FastGuid are used as a key? I'm wondering (without evidence) if the GetHashCode distribution of the guids are not spread out in the same way as with Guid.NewGuid?

SergeiPavlov · 2024-03-18T16:53:13Z

I have tried to mix bits of Guid:

    public static Guid NewGuid()
    {
        var v = Interlocked.Increment(ref variablePart);
        var h = (int)v * 521;               // one-to-one function to improve Guid's hash distribution
        return new(h, (short)(v >> 32), (short)(v >> 48), BasePart);
    }

But it does not show visible improvement.
Guid.GetHashCode() XORs all 32-bit parts of Guid, so the distribution should be uniform

* FastGuid improved 25% faster implementation of FastGuid with MemoryMarshal * fix usings --------- Co-authored-by: Sergei Pavlov <spavlov@servicetitan.com>

SergeiPavlov · 2024-03-18T20:24:47Z

Thanks to @snaumenko-st for further improvement of this PR: 88272de

Benchmark results look not worse than before.

tillig · 2024-04-05T16:13:21Z

Trying to figure out the state of this so I can get a 0.1.0 release out for #1415. I may just release that one without this, regardless.

However, looking at the chain here, it sounds like:

The first run of benchmarks showed that things actually got worse.
Subsequent runs (not shown) indicate things aren't worse, but also aren't any better.
The primary use case here is to improve registration time, which happens effectively once per application lifecycle.

It's not interesting to add an optimization that doesn't do anything, so I'm not sure where that leaves us.

If it's something where we can show some really marked improvement like "it used to take 10 seconds to start this app and now it takes half a second" or whatever, I'm all for it. But I'm not super interested in optimizing the "we told Autofac to scan and register every type in the entire app and all assemblies" use case - folks shouldn't be doing that, and the pain may be a forcing factor to tell them they're doing something wrong, especially if we're still supporting AnyConcreteTypeNotAlreadyRegisteredSource.

SergeiPavlov · 2024-04-05T16:49:37Z

The primary use case here is to improve registration time, which happens effectively once per application lifecycle.

In our case most of restrations happen at App start, but a few additional happen at every Web Request.
That is why we are interested in this improvement.

tillig · 2024-04-05T17:00:34Z

I think I still need to see some sort of improvement, which means this PR possibly needs to include adding some benchmarks that show the affected path and can show that there's improvement. It'd just be nice to see that it's not, like, "2ns improvement" or something basically meaningless. I saw the benchmark of raw GUID creation, but it'd be good to benchmark it in the context of the whole execution path.

SergeiPavlov · 2024-04-05T20:20:00Z

Added a RegistrationBenchmark:

Before:

Method	Mean	Error	StdDev	Gen0	Gen1	Allocated
Register	53.76 us	0.656 us	0.548 us	0.3052	0.2441	26.77 KB

After:

Method	Mean	Error	StdDev	Gen0	Gen1	Allocated
Register	49.55 us	0.648 us	0.575 us	0.3052	0.2441	27.82 KB

tillig · 2024-04-05T20:52:40Z

Let me verify I'm reading this right:

After adding the fast GUID optimization, across six registrations you'll save about four microseconds (0.7us per registration) but allocate 1KB more memory to get that.

Does that sound right?

SergeiPavlov · 2024-04-05T22:27:37Z

Yes.

Second run of this benchmark confirms.
Probably new Guids have not so random distribution and Dictionary allocates more buckets for them.

tillig · 2024-04-05T22:30:14Z

Hmmm. Maybe @alistairjevans feels differently, but... I... I have to admit, I kind of lean against this given the trade-off between the microseconds vs. allocations doesn't add up.

alistairjevans · 2024-04-05T22:43:05Z

Yeah, I have to agree, the extra code complexity doesn't seem worth the tiny time gains, and we also don't understand the dictionary bucket impacts enough to know that any gain would be consistent anyway.

SergeiPavlov · 2024-04-06T08:30:05Z

We see following hotspot on NewGuid() during profiling

So it looks not-insignificant

alistairjevans · 2024-04-06T09:04:24Z

Was that profiling session captured on windows or Linux?

SergeiPavlov · 2024-04-06T09:05:41Z

Linux Debian x64 .NET8

alistairjevans · 2024-04-06T09:07:05Z

Ah; is that also where you've been running benchmarks?

SergeiPavlov · 2024-04-07T14:20:49Z

Ah; is that also where you've been running benchmarks?

Not exactly
I was running benchmarks on Ubuntu x64 on real PC
but the App profiling was under Docker container in the Azure cloud

May be .NewGuid() implementation differs in that environment.
It is based on /dev/urandom, and VM Hypervisor may have slow implementation of this virtual device

tillig · 2024-04-09T16:45:20Z

Is it possible to run the benchmarks in that environment? Again, it'd just be nice to verify that this is going to fix the issue before it gets accepted.

SergeiPavlov · 2024-04-09T23:53:28Z

Is it possible to run the benchmarks in that environment?

I did (isolated Guid benchmark only). The results on Azure are the same as on bare Linux.
FastGuid ~10x times faster than Guid.NewGuid()

A agree, it is not enough to be merge into Autofac

tillig · 2024-04-10T01:14:26Z

Sounds good. We'll close this one out for now, but if you can show that it will be a marked improvement in some environments maybe we can revive it and target those environments.

SergeiPavlov added 2 commits March 15, 2024 16:09

FastGuid.NewGuid() helper

9c7bc13

Add FastGuid.cs

1664591

Update FastGuid.cs

0e679c7

FastGuid improved (#1)

88272de

* FastGuid improved 25% faster implementation of FastGuid with MemoryMarshal * fix usings --------- Co-authored-by: Sergei Pavlov <spavlov@servicetitan.com>

tillig mentioned this pull request Mar 18, 2024

Optimization: Memoize RequiredMemberAttribute search #1415

Merged

Add RegistrationBenchmark

fed8a76

tillig closed this Apr 10, 2024

Uh oh!

FastGuid.NewGuid() helper #1414

FastGuid.NewGuid() helper #1414

Uh oh!

Conversation

SergeiPavlov commented Mar 15, 2024

Uh oh!

alistairjevans commented Mar 15, 2024

Uh oh!

codecov bot commented Mar 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

botinko commented Mar 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alistairjevans commented Mar 16, 2024

Uh oh!

SergeiPavlov commented Mar 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alistairjevans commented Mar 18, 2024

Uh oh!

SergeiPavlov commented Mar 18, 2024

Uh oh!

SergeiPavlov commented Mar 18, 2024

Uh oh!

tillig commented Apr 5, 2024

Uh oh!

SergeiPavlov commented Apr 5, 2024

Uh oh!

tillig commented Apr 5, 2024

Uh oh!

SergeiPavlov commented Apr 5, 2024

Uh oh!

tillig commented Apr 5, 2024

Uh oh!

SergeiPavlov commented Apr 5, 2024

Uh oh!

tillig commented Apr 5, 2024

Uh oh!

alistairjevans commented Apr 5, 2024

Uh oh!

SergeiPavlov commented Apr 6, 2024

Uh oh!

alistairjevans commented Apr 6, 2024

Uh oh!

SergeiPavlov commented Apr 6, 2024

Uh oh!

alistairjevans commented Apr 6, 2024

Uh oh!

SergeiPavlov commented Apr 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tillig commented Apr 9, 2024

Uh oh!

SergeiPavlov commented Apr 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tillig commented Apr 10, 2024

Uh oh!

Uh oh!

`FastGuid.NewGuid()` helper #1414

`FastGuid.NewGuid()` helper #1414

codecov bot commented Mar 15, 2024 •

edited

Loading

botinko commented Mar 16, 2024 •

edited

Loading

SergeiPavlov commented Mar 18, 2024 •

edited

Loading

SergeiPavlov commented Apr 7, 2024 •

edited

Loading

SergeiPavlov commented Apr 9, 2024 •

edited

Loading