Skip to content

Conversation

NinoFloris
Copy link
Member

@NinoFloris NinoFloris commented Mar 16, 2024

Fixes #4944

After this PR, sync exporting ends up being 10% faster on my machine (M1 Max) than using sync NpgsqlDataReader + SequentialAccess. Testing was done based on the project that was linked to here #4944 (comment).

PR mostly consists of splitting apart sync and async paths. Having sync column reads run at the highest perf is as important here as it is in NpgsqlDataReader with GetFieldValue.

Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you post the absolute benchmark results as well? Just to know what the 10% translate into in the larger scheme of things etc.

@@ -127,7 +127,48 @@ async Task ReadHeader(bool async)
/// The number of columns in the row. -1 if there are no further rows.
/// Note: This will currently be the same value for all rows, but this may change in the future.
/// </returns>
public int StartRow() => StartRow(false).GetAwaiter().GetResult();
public int StartRow()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a per-row thing, so a bit less convincing than per-column in terms of needing to split sync/async for micro-optimization, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially, as you can see in the results, this is anything but a micro-optimization. It's really just quite expensive to run an async method for a small bit of work. Additionally it thwarts all inlining from the JIT due to the try finally present in AsyncMethodBuilder.Start (which is i.a. responsible for getting and resetting the sync ctx, thread static ops that are also not free)

I agree the frequency is lower but IMO it's still important, we try hard in the reader to stay sync as well.

@NinoFloris
Copy link
Member Author


BenchmarkDotNet v0.13.12, macOS Sonoma 14.2.1 (23C71) [Darwin 23.2.0]
Apple M1 Max, 1 CPU, 10 logical and 10 physical cores
.NET SDK 9.0.100-preview.1.24101.2
  [Host]     : .NET 8.0.0 (8.0.23.53103), Arm64 RyuJIT AdvSIMD
  DefaultJob : .NET 8.0.0 (8.0.23.53103), Arm64 RyuJIT AdvSIMD


With PR:

Method Mean Error StdDev Ratio RatioSD
BufferedReader 559.1 ms 9.89 ms 11.38 ms 1.27 0.02
SequentialReader 493.1 ms 9.53 ms 8.91 ms 1.12 0.02
Export 442.1 ms 2.45 ms 2.29 ms 1.00 0.00

Without:

Method Mean Error StdDev Ratio RatioSD
BufferedReader 552.6 ms 8.41 ms 7.02 ms 0.91 0.02
SequentialReader 496.3 ms 9.78 ms 10.47 ms 0.82 0.02
Export 605.1 ms 6.14 ms 5.44 ms 1.00 0.00

@NinoFloris
Copy link
Member Author

NinoFloris commented Mar 17, 2024

Just for testing I locally rebased this PR onto #5631 (which is based on #5480 and #5476) to get the following results:

Method Mean Error StdDev Ratio RatioSD
BufferedReader 453.6 ms 8.33 ms 7.79 ms 1.06 0.02
SequentialReader 458.5 ms 7.95 ms 7.04 ms 1.07 0.02
Export 428.4 ms 2.47 ms 2.31 ms 1.00 0.00

Without this PR things get really ridiculous for the exporter:

Method Mean Error StdDev Ratio RatioSD
BufferedReader 459.3 ms 8.89 ms 10.24 ms 0.76 0.02
SequentialReader 459.8 ms 9.16 ms 11.59 ms 0.76 0.02
Export 602.5 ms 8.61 ms 8.05 ms 1.00 0.00

@roji
Copy link
Member

roji commented Mar 17, 2024

Out of curiosity, how many rows/columns are being exported here? Maybe post the benchmark source?

(not pushing back, just interested in more details)

@NinoFloris
Copy link
Member Author

NinoFloris commented Mar 17, 2024

Benchmark is a bit hacked but it's basically #4944 (comment)

So that's 1 million rows being read

The actual calls into Npgsql are mostly here https://github.com/SoftStoneDevelop/NpgsqlBinaryExporter/blob/main/Src/NpgsqlBenchmark/Extensions/Extension.cs

@NinoFloris
Copy link
Member Author

NinoFloris commented Mar 17, 2024

Also see:

https://github.com/SoftStoneDevelop/NpgsqlBinaryExporter/blob/main/Src/NpgsqlBenchmark/Extensions/Extension.cs#L124-L130

This pattern is why SkipIfNull/SkipNull would be useful, it would also show perf wins in datasets containing a lot of nulls (here only the identification_id is null 50% of the time).

Screenshot 2024-03-17 at 18 08 36

@NinoFloris
Copy link
Member Author

Will revert the split on StartRow for now. It constitutes about 5% of the total perf improvement while it increases maintainability costs. Can always reevaluate at a later moment.

@NinoFloris NinoFloris force-pushed the improve-binary-exporter-perf branch from b461159 to ac6b38d Compare March 17, 2024 19:50
@NinoFloris NinoFloris merged commit 4b7bf6b into main Mar 17, 2024
@NinoFloris NinoFloris deleted the improve-binary-exporter-perf branch March 17, 2024 21:16
@NinoFloris NinoFloris added this to the 8.0.3 milestone Mar 28, 2024
NinoFloris added a commit that referenced this pull request Mar 28, 2024
Fixes #4944

(cherry picked from commit 4b7bf6b)
@NinoFloris
Copy link
Member Author

Backported to 8.0.3 via 66969a5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve NpgsqlBinaryExporter performance
2 participants