Skip to content

Conversation

l0rinc
Copy link
Contributor

@l0rinc l0rinc commented Jul 12, 2024

This change is part of [IBD] - Tracking PR for speeding up Initial Block Download

Summary

The in-memory representation of the UTXO set uses (salted) SipHash for avoiding key collision attacks.

Hashing a uint256 key is done so often that a specialized optimization was extracted to SipHashUint256Extra. The constant salting operations were already extracted in the general case, this PR adjust the main specialization similarly.

Details

Previously, only k0 and k1 were stored, causing the constant xor operations to be recomputed in every call to SipHashUint256Extra.
This commit adds a dedicated Uint256ExtraSipHasher class that caches the initial state (v0-v3) and to perform the SipHash computation on a uint256 (with an extra parameter), hiding the constant computation details from higher-level code and improving efficiency (as suggested by Sipa in #30442 (comment)).

Measurements

Benchmarks
diff --git a/src/bench/crypto_hash.cpp b/src/bench/crypto_hash.cpp
--- a/src/bench/crypto_hash.cpp	(revision 9b1a7c3e8dd78d97fbf47c2d056d043b05969176)
+++ b/src/bench/crypto_hash.cpp	(revision e1b4f056b3097e7e34b0eda31f57826d81c9d810)
@@ -2,7 +2,6 @@
 // Distributed under the MIT software license, see the accompanying
 // file COPYING or http://www.opensource.org/licenses/mit-license.php.
 
-
 #include <bench/bench.h>
 #include <crypto/muhash.h>
 #include <crypto/ripemd160.h>
@@ -12,9 +11,11 @@
 #include <crypto/sha512.h>
 #include <crypto/siphash.h>
 #include <random.h>
-#include <span.h>
 #include <tinyformat.h>
 #include <uint256.h>
+#include <primitives/transaction.h>
+#include <util/hasher.h>
+#include <unordered_set>
 
 #include <cstdint>
 #include <vector>
@@ -205,6 +206,98 @@
     });
 }
 
+static void SaltedOutpointHasherBench_hash(benchmark::Bench& bench)
+{
+    FastRandomContext rng{/*fDeterministic=*/true};
+    constexpr size_t size{1000};
+
+    std::vector<COutPoint> outpoints(size);
+    for (auto& outpoint : outpoints) {
+        outpoint = {Txid::FromUint256(rng.rand256()), rng.rand32()};
+    }
+
+    const SaltedOutpointHasher hasher;
+    bench.batch(size).run([&] {
+        size_t result{0};
+        for (const auto& outpoint : outpoints) {
+            result ^= hasher(outpoint);
+        }
+        ankerl::nanobench::doNotOptimizeAway(result);
+    });
+}
+
+static void SaltedOutpointHasherBench_match(benchmark::Bench& bench)
+{
+    FastRandomContext rng{/*fDeterministic=*/true};
+    constexpr size_t size{1000};
+
+    std::unordered_set<COutPoint, SaltedOutpointHasher> values;
+    std::vector<COutPoint> value_vector;
+    values.reserve(size);
+    value_vector.reserve(size);
+
+    for (size_t i{0}; i < size; ++i) {
+        COutPoint outpoint{Txid::FromUint256(rng.rand256()), rng.rand32()};
+        values.emplace(outpoint);
+        value_vector.push_back(outpoint);
+        assert(values.contains(outpoint));
+    }
+
+    bench.batch(size).run([&] {
+        bool result{true};
+        for (const auto& outpoint : value_vector) {
+            result ^= values.contains(outpoint);
+        }
+        ankerl::nanobench::doNotOptimizeAway(result);
+    });
+}
+
+static void SaltedOutpointHasherBench_mismatch(benchmark::Bench& bench)
+{
+    FastRandomContext rng{/*fDeterministic=*/true};
+    constexpr size_t size{1000};
+
+    std::unordered_set<COutPoint, SaltedOutpointHasher> values;
+    std::vector<COutPoint> missing_value_vector;
+    values.reserve(size);
+    missing_value_vector.reserve(size);
+
+    for (size_t i{0}; i < size; ++i) {
+        values.emplace(Txid::FromUint256(rng.rand256()), rng.rand32());
+        COutPoint missing_outpoint{Txid::FromUint256(rng.rand256()), rng.rand32()};
+        missing_value_vector.push_back(missing_outpoint);
+        assert(!values.contains(missing_outpoint));
+    }
+
+    bench.batch(size).run([&] {
+        bool result{false};
+        for (const auto& outpoint : missing_value_vector) {
+            result ^= values.contains(outpoint);
+        }
+        ankerl::nanobench::doNotOptimizeAway(result);
+    });
+}
+
+static void SaltedOutpointHasherBench_create_set(benchmark::Bench& bench)
+{
+    FastRandomContext rng{/*fDeterministic=*/true};
+    constexpr size_t size{1000};
+
+    std::vector<COutPoint> outpoints(size);
+    for (auto& outpoint : outpoints) {
+        outpoint = {Txid::FromUint256(rng.rand256()), rng.rand32()};
+    }
+
+    bench.batch(size).run([&] {
+        std::unordered_set<COutPoint, SaltedOutpointHasher> set;
+        set.reserve(size);
+        for (const auto& outpoint : outpoints) {
+            set.emplace(outpoint);
+        }
+        ankerl::nanobench::doNotOptimizeAway(set.size());
+    });
+}
+
 static void MuHash(benchmark::Bench& bench)
 {
     MuHash3072 acc;
@@ -276,6 +369,10 @@
 BENCHMARK(SHA256_32b_AVX2, benchmark::PriorityLevel::HIGH);
 BENCHMARK(SHA256_32b_SHANI, benchmark::PriorityLevel::HIGH);
 BENCHMARK(SipHash_32b, benchmark::PriorityLevel::HIGH);
+BENCHMARK(SaltedOutpointHasherBench_hash, benchmark::PriorityLevel::HIGH);
+BENCHMARK(SaltedOutpointHasherBench_match, benchmark::PriorityLevel::HIGH);
+BENCHMARK(SaltedOutpointHasherBench_mismatch, benchmark::PriorityLevel::HIGH);
+BENCHMARK(SaltedOutpointHasherBench_create_set, benchmark::PriorityLevel::HIGH);
 BENCHMARK(SHA256D64_1024_STANDARD, benchmark::PriorityLevel::HIGH);
 BENCHMARK(SHA256D64_1024_SSE4, benchmark::PriorityLevel::HIGH);
 BENCHMARK(SHA256D64_1024_AVX2, benchmark::PriorityLevel::HIGH);

cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='SaltedOutpointHasherBench' -min-time=10000

Before:

ns/op op/s err% total benchmark
58.60 17,065,922.04 0.3% 11.02 SaltedOutpointHasherBench_create_set
11.97 83,576,684.83 0.1% 11.01 SaltedOutpointHasherBench_hash
14.50 68,985,850.12 0.3% 10.96 SaltedOutpointHasherBench_match
13.90 71,942,033.47 0.4% 11.03 SaltedOutpointHasherBench_mismatch

After:

ns/op op/s err% total benchmark
57.27 17,462,299.19 0.1% 11.02 SaltedOutpointHasherBench_create_set
11.24 88,997,888.48 0.3% 11.04 SaltedOutpointHasherBench_hash
13.91 71,902,014.20 0.2% 11.01 SaltedOutpointHasherBench_match
13.29 75,230,390.31 0.1% 11.00 SaltedOutpointHasherBench_mismatch

compared to master:

create_set - 17,462,299.19 / 17,065,922.04 - 2.3% faster
hash       - 88,997,888.48 / 83,576,684.83 - 6.4% faster
match      - 71,902,014.20 / 68,985,850.12 - 4.2% faster
mismatch   - 75,230,390.31 / 71,942,033.47 - 4.5% faster

C++ compiler .......................... GNU 13.3.0

Before:

ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
136.76 7,312,133.16 0.0% 1,086.67 491.12 2.213 119.54 1.1% 11.01 SaltedOutpointHasherBench_create_set
23.82 41,978,882.62 0.0% 252.01 85.57 2.945 4.00 0.0% 11.00 SaltedOutpointHasherBench_hash
60.42 16,549,695.42 0.1% 460.51 217.04 2.122 21.00 1.4% 10.99 SaltedOutpointHasherBench_match
78.66 12,713,595.35 0.1% 555.59 282.52 1.967 20.19 2.2% 10.74 SaltedOutpointHasherBench_mismatch

After:

ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
135.38 7,386,349.49 0.0% 1,078.19 486.16 2.218 119.56 1.1% 11.00 SaltedOutpointHasherBench_create_set
23.67 42,254,558.08 0.0% 247.01 85.01 2.906 4.00 0.0% 11.00 SaltedOutpointHasherBench_hash
58.95 16,962,220.14 0.1% 446.55 211.74 2.109 20.86 1.4% 11.01 SaltedOutpointHasherBench_match
76.98 12,991,047.69 0.1% 548.93 276.50 1.985 20.25 2.3% 10.72 SaltedOutpointHasherBench_mismatch
compared to master:
create_set -  7,386,349.49 / 7,312,133.16  - 1.0% faster
hash       - 42,254,558.08 / 41,978,882.62 - 0.6% faster
match      - 16,962,220.14 / 16,549,695.42 - 2.4% faster
mismatch   - 12,991,047.69 / 12,713,595.35 - 2.1% faster

@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 12, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30442.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK jonatack
Approach ACK Raimo33
Stale ACK laanwj, ismaelsadeeq

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

@l0rinc l0rinc force-pushed the paplorinc/siphash branch from ef4a436 to 3b7b82b Compare July 12, 2024 16:39
@l0rinc l0rinc changed the title Precalculate SipHash constant XOR with k0 and k1 in SaltedOutpointHasher optimization: Precalculate SipHash constant XOR with k0 and k1 in SaltedOutpointHasher Jul 12, 2024
@l0rinc l0rinc marked this pull request as ready for review July 13, 2024 09:01
@l0rinc
Copy link
Contributor Author

l0rinc commented Jul 14, 2024

@andrewtoth, this is another tiny addition to the coincache speedup, your review would be welcome.

@andrewtoth
Copy link
Contributor

I did not see any improvement in the benchmark with this change. Running ./src/bench/bench_bitcoin -filter=.*Out[pP]oint.*

commit 3b7b82b

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|---------------:|--------:|----------:|:----------
|                1.31 |      766,241,272.84 |    0.2% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `COutPoint_equality_match`
|                0.64 |    1,561,309,649.35 |    0.0% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `COutPoint_equality_mismatch`
|              107.62 |        9,292,298.54 |    1.7% |            0.00 |            0.00 |           0.00 |    0.0% |      0.10 | `SaltedOutpointHasherBenchmark_create_set`
|               23.36 |       42,799,791.14 |    0.0% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `SaltedOutpointHasherBenchmark_hash`
|               48.78 |       20,502,210.98 |    0.1% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `SaltedOutpointHasherBenchmark_match`
|               69.31 |       14,426,956.77 |    0.3% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `SaltedOutpointHasherBenchmark_mismatch`

commit 7ac4e3a

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|---------------:|--------:|----------:|:----------
|                1.30 |      766,440,414.89 |    0.0% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `COutPoint_equality_match`
|                1.02 |      982,018,502.07 |    0.2% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `COutPoint_equality_mismatch`
|              108.92 |        9,180,628.87 |    0.9% |            0.00 |            0.00 |           0.00 |    0.0% |      0.10 | `SaltedOutpointHasherBenchmark_create_set`
|               23.69 |       42,213,365.83 |    0.1% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `SaltedOutpointHasherBenchmark_hash`
|               47.90 |       20,876,369.10 |    0.1% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `SaltedOutpointHasherBenchmark_match`
|               70.10 |       14,265,655.03 |    0.1% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `SaltedOutpointHasherBenchmark_mismatch`

@l0rinc
Copy link
Contributor Author

l0rinc commented Jul 14, 2024

That's disappointing, thanks for checking, something's still off with my compiler, it seems.
After I finish benching #28280 (comment), I'll try this again on the dedicates server instead.
I'm also a bit surprised that the ratios are completely different compared to my measurements, e.g. SaltedOutpointHasherBenchmark_match and _mismatch are 40% off, while in my case they're basically equal. Are compilers this different?
Could you please send me the exact command and compiler versions you've used?
Thanks!

@l0rinc l0rinc marked this pull request as draft July 15, 2024 15:17
@andrewtoth
Copy link
Contributor

andrewtoth commented Jul 15, 2024

Could you please send me the exact command and compiler versions you've used?

gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)

I ran
./configure --enable-bench
make
./src/bench/bench_bitcoin -filter=.*Out[pP]oint.*

@l0rinc
Copy link
Contributor Author

l0rinc commented Jul 16, 2024

Now that the other benchmark finished after a few days, I ran this on a gcc (Debian 12.2.0-14) 12.2.0.

Outdated results

After the first run I got a warning of:

Warning, results might be unstable:

  • CPU frequency scaling enabled: CPU 0 between 800.0 and 4,200.0 MHz
  • CPU governor is 'powersave' but should be 'performance'
  • Turbo is enabled, CPU frequency will fluctuate

which I've fixed and also ran pyperf system tune.
Additionally I set a --min-time=10000, otherwise the results weren't consistent between runs. I ran each change twice to make sure they're stable.

before:

./configure --enable-bench
git checkout 7ac4e3aaf345a06fdde463100c45c4295d730820 && git reset --hard
make -j$(nproc) && ./src/bench/bench_bitcoin -filter='.*Out[pP]oint.*' --min-time=10000
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
135.01 7,406,978.51 0.1% 1,060.42 485.49 2.184 120.72 1.1% 11.05 SaltedOutpointHasherBenchmark_create_set
23.89 41,860,205.94 0.0% 252.01 85.94 2.932 4.00 0.0% 11.00 SaltedOutpointHasherBenchmark_hash
59.81 16,719,701.99 0.1% 451.14 215.16 2.097 20.83 1.5% 11.00 SaltedOutpointHasherBenchmark_match
79.21 12,624,726.37 0.0% 566.84 284.97 1.989 20.50 2.3% 10.74 SaltedOutpointHasherBenchmark_mismatch
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
133.71 7,479,023.41 0.0% 1,060.38 480.97 2.205 120.72 1.1% 11.03 SaltedOutpointHasherBenchmark_create_set
23.87 41,894,947.89 0.0% 252.01 85.88 2.935 4.00 0.0% 11.00 SaltedOutpointHasherBenchmark_hash
60.05 16,652,312.90 0.1% 452.20 216.05 2.093 20.86 1.3% 11.00 SaltedOutpointHasherBenchmark_match
79.90 12,515,491.22 0.1% 566.85 287.46 1.972 20.50 2.3% 10.74

after:

git checkout 3b7b82b4b0c39a38538aae2cba10bec3907c5cbf && git reset --hard
make -j$(nproc) && ./src/bench/bench_bitcoin -filter='.*Out[pP]oint.*' --min-time=10000
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
132.60 7,541,596.87 0.0% 1,045.84 476.97 2.193 120.72 1.1% 11.03 SaltedOutpointHasherBenchmark_create_set
23.43 42,674,820.33 0.0% 246.01 84.29 2.918 4.00 0.0% 11.00 SaltedOutpointHasherBenchmark_hash
58.90 16,977,444.58 0.0% 444.45 211.91 2.097 20.92 1.3% 11.00 SaltedOutpointHasherBenchmark_match
78.29 12,773,764.99 0.0% 544.01 281.65 1.932 20.29 2.6% 10.73 SaltedOutpointHasherBenchmark_mismatch
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
132.75 7,533,102.65 0.0% 1,045.87 477.53 2.190 120.72 1.1% 11.02 SaltedOutpointHasherBenchmark_create_set
23.44 42,660,977.02 0.0% 246.01 84.33 2.917 4.00 0.0% 11.00 SaltedOutpointHasherBenchmark_hash
59.05 16,934,798.45 0.0% 444.96 212.44 2.094 20.93 1.3% 11.01 SaltedOutpointHasherBenchmark_match
76.32 13,103,542.96 0.1% 531.55 274.56 1.936 19.99 2.4% 10.71 SaltedOutpointHasherBenchmark_mismatch

Resulting in:

SaltedOutpointHasherBenchmark_create_set: Before Avg = 7443000.96, After Avg = 7537349.76, Increase = 1.27%

SaltedOutpointHasherBenchmark_hash: Before Avg = 41877576.91, After Avg = 42667898.67, Increase = 1.89%

SaltedOutpointHasherBenchmark_match: Before Avg = 16686007.45, After Avg = 16956121.52, Increase = 1.62%

SaltedOutpointHasherBenchmark_mismatch: Before Avg = 12570108.79, After Avg = 12938653.98, Increase = 2.93%

@andrewtoth, what do you think, can you try reproducing these results?

@l0rinc l0rinc marked this pull request as ready for review July 18, 2024 11:46
@andrewtoth
Copy link
Contributor

Based on your benchmark results, I don't think this can be called an optimization. It seems to be worse for one benchmark, better in another, and roughly the same for the rest. The description also notes that this will not be noticeable by users. In light of that, does the motivation for this PR still hold?

@l0rinc l0rinc closed this Jul 25, 2024
@l0rinc
Copy link
Contributor Author

l0rinc commented Jul 26, 2024

I've removed the COutPoint_equality changes and benchmarks (updated the comments to avoid confusion), the SaltedOutpointHasher speed gains remain.
I'll let you decide if it's worth doing the change or not.

@l0rinc l0rinc reopened this Jul 26, 2024
@l0rinc l0rinc force-pushed the paplorinc/siphash branch 2 times, most recently from 0dc224b to 8d6c6bd Compare July 26, 2024 18:14
@DrahtBot DrahtBot mentioned this pull request Aug 30, 2024
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/27977216903

Hints

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@l0rinc l0rinc force-pushed the paplorinc/siphash branch from 8d6c6bd to 44098fe Compare August 31, 2024 14:52
@l0rinc l0rinc changed the title optimization: Precalculate SipHash constant XOR with k0 and k1 in SaltedOutpointHasher optimization: precalculate SipHash constant XOR with k0 and k1 in SaltedOutpointHasher Aug 31, 2024
@l0rinc l0rinc force-pushed the paplorinc/siphash branch from 44098fe to 183052f Compare October 15, 2024 15:27
@achow101 achow101 requested a review from josibake October 15, 2024 15:35
@l0rinc l0rinc force-pushed the paplorinc/siphash branch from 183052f to bc959f5 Compare October 16, 2024 13:14
@l0rinc
Copy link
Contributor Author

l0rinc commented Oct 16, 2024

@laanwj This is the optimization that relies on #30349, would really appreciate you input on it.

@laanwj laanwj self-requested a review October 17, 2024 08:00
@laanwj
Copy link
Member

laanwj commented Jan 20, 2025

Sorry for taking so long.

FWIW, Code review ACK bc959f5. Moving the magic numbers to constants seems like a good idea to me. It's good to add the benchmarks. And although the speed change is neglilible (as would be expected from factoring out ~four xor instructions), it doesn't complicate the code much either.

@l0rinc l0rinc reopened this Jan 20, 2025
Copy link
Member

@ismaelsadeeq ismaelsadeeq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve thoroughly reviewed 6ce13a3 and lightly reviewed bc959f5.

I’ve noticed a slight increase in performance.

Moving the magic numbers to constants seems like a good idea to me. It's good to add the benchmarks.

Same!

ACK bc959f5

@DrahtBot DrahtBot requested a review from ismaelsadeeq January 23, 2025 21:44
@sipa
Copy link
Member

sipa commented Feb 1, 2025

I'm not convinced about the approach here: it's leaking a SipHash implementation detail (the way the initial v0..v3 values are computed) into a higher layer (util/hasher.h).

I would suggest to have a Uint256ExtraSipHasher class instead of the SipHashUint256Extra function, with a constructor that takes k0, k1 like the function current before, and computes the v0..v3 values, and then an uint64_t Uint256ExtraSipHasher::operator()(const uint256& val, uint32_t extra) const noexcept which contains the meat of the computation.

@l0rinc l0rinc force-pushed the paplorinc/siphash branch 2 times, most recently from 2885e77 to d848da9 Compare February 1, 2025 19:30
@l0rinc
Copy link
Contributor Author

l0rinc commented Feb 1, 2025

Thanks a lot for the review @sipa. The SipHash leaking into the hasher also bothered me a lot (it's also why I closed it a few weeks ago before @laanwj revived it), but I like the Uint256ExtraSipHasher wrapper suggestion (basically the same as CSipHasher now, except for the extra tmp and count fields: both constructors cache the constant parts now.

I'm not sure why the suggested name was Uint256ExtraSipHasher and not SipHasherUint256Extra (to be more similar to SipHashUint256), but I don't really mind either way.

I pushed a separate rebase (without conflicts), to simplify review, and the requested changes (and a few more related cleanups) in https://github.com/bitcoin/bitcoin/compare/2885e7785417b57a3f3cad420196837965572258..d848da97f2e4fd0997507b2d1c2a1f4a2325193d

I'd like to clean up SipHash after this change (get rid of templates, reduce duplication) in a separate PR - if this gets merged.

@l0rinc l0rinc changed the title optimization: precalculate SipHash constant XOR with k0 and k1 in SaltedOutpointHasher [IBD] Precalculate some SipHash constant calculations Mar 12, 2025
@l0rinc l0rinc changed the title [IBD] Precalculate some SipHash constant calculations [IBD] precalculate some SipHash constant calculations Mar 12, 2025
@l0rinc l0rinc changed the title [IBD] precalculate some SipHash constant calculations [IBD] precalculate SipHash constant salt calculations Mar 12, 2025
@l0rinc l0rinc force-pushed the paplorinc/siphash branch from d848da9 to 31e1a49 Compare March 12, 2025 19:56
@l0rinc l0rinc force-pushed the paplorinc/siphash branch from 31e1a49 to 987be1b Compare April 17, 2025 09:42
@l0rinc
Copy link
Contributor Author

l0rinc commented May 16, 2025

@laanwj, @sipa, I've addressed the concerns you've raised, I'd appreciate a re-review here

l0rinc and others added 3 commits August 13, 2025 15:37
…ash constant state

Previously, only k0 and k1 were stored, causing the constant xor operations to be recomputed in every call to `SipHashUint256Extra`.
This commit adds a dedicated `Uint256ExtraSipHasher` class that caches the initial state (v0-v3) and to perform the `SipHash` computation on a `uint256` (with an extra parameter), hiding the constant computation details from higher-level code and improving efficiency.
This basically brings the precalculations in the `CSipHasher` constructor to the `uint256` specialized SipHash implementation.

> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='SaltedOutpointHasherBench.*' -min-time=10000

> C++ compiler .......................... AppleClang 16.0.0.16000026

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               57.27 |       17,462,299.19 |    0.1% |     11.02 | `SaltedOutpointHasherBench_create_set`
|               11.24 |       88,997,888.48 |    0.3% |     11.04 | `SaltedOutpointHasherBench_hash`
|               13.91 |       71,902,014.20 |    0.2% |     11.01 | `SaltedOutpointHasherBench_match`
|               13.29 |       75,230,390.31 |    0.1% |     11.00 | `SaltedOutpointHasherBench_mismatch`

compared to master:
create_set - 17,462,299.19/17,065,922.04 - 2.3% faster
hash       - 88,997,888.48/83,576,684.83 - 6.4% faster
match      - 71,902,014.20/68,985,850.12 - 4.2% faster
mismatch   - 75,230,390.31/71,942,033.47 - 4.5% faster

> C++ compiler .......................... GNU 13.3.0

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|              135.38 |        7,386,349.49 |    0.0% |        1,078.19 |          486.16 |  2.218 |         119.56 |    1.1% |     11.00 | `SaltedOutpointHasherBench_create_set`
|               23.67 |       42,254,558.08 |    0.0% |          247.01 |           85.01 |  2.906 |           4.00 |    0.0% |     11.00 | `SaltedOutpointHasherBench_hash`
|               58.95 |       16,962,220.14 |    0.1% |          446.55 |          211.74 |  2.109 |          20.86 |    1.4% |     11.01 | `SaltedOutpointHasherBench_match`
|               76.98 |       12,991,047.69 |    0.1% |          548.93 |          276.50 |  1.985 |          20.25 |    2.3% |     10.72 | `SaltedOutpointHasherBench_mismatch`

compared to master:
create_set -  7,386,349.49/7,312,133.16  - 1% faster
hash       - 42,254,558.08/41,978,882.62 - 0.6% faster
match      - 16,962,220.14/16,549,695.42 - 2.4% faster
mismatch   - 12,991,047.69/12,713,595.35 - 2% faster

Co-authored-by: sipa <pieter@wuille.net>
@l0rinc
Copy link
Contributor Author

l0rinc commented Aug 13, 2025

Rebased to fix a conflict with #33116.

Also dropped the benchmarking commit to make the change less scary - but kept the patch in the PRs description for anyone interested.

Change is ready for review again.

@l0rinc l0rinc force-pushed the paplorinc/siphash branch from 987be1b to 656da51 Compare August 13, 2025 22:46
Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK to be pinged by the bot, plan to review in a couple of weeks.

@l0rinc l0rinc changed the title [IBD] precalculate SipHash constant salt calculations [IBD] precalculate SipHash constant salt XORs Aug 13, 2025
@Raimo33
Copy link

Raimo33 commented Aug 30, 2025

Concept ACK
Approach ACK
ACK 585fa7e

I haven't tested the code but I have reviewed it and it looks reasonable; i think the approach is sound and the two class design better addresses the matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants