bench: replace benchmark block with more representative one (413567 → 784588) #32457

l0rinc · 2025-05-09T09:30:08Z

Draft, until I investigate if we can generate a similar block instead of adding a real one to the repo

Summary

This PR replaces our benchmark's reference block with one that's more modern and representative of current usage patterns.

Context

The current benchmark block was mined in 2016 and added in PR #9049. Since it predates many modern script types, our benchmarks don't accurately reflect current network conditions.

Suggestion

We're replacing it with block 784588 from 2023, which provides a better balance - it's recent enough to include modern script types while still containing legacy scripts typically encountered during IBD.

The PR consists of two commits:

first documenting the current block's script type distribution;
then replacing it with the new block and updating assertions accordingly.

This commit documents the current benchmark-base block's properties, to highlight the differences with the replacement block in the next commit.

https://mempool.space/block/413567 was mined in 2016, added as a benchmark-base in bitcoin#9049. It lacks modern script types, making the benchmarks unrepresentative of current usage. In this commit we're replacing it with https://mempool.space/block/784588 from 2023. This block was selected because it's old enough to include legacy script types encountered during IBD, while also containing modern script types in proportions that better reflect current block composition.

DrahtBot · 2025-05-09T09:30:12Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32457.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#32554 (RFC: bench: replace embedded raw block with configurable block generator by l0rinc)
#32532 (script: short-circuit GetLegacySigOpCount for known scripts by l0rinc)
#31682 ([IBD] specialize CheckBlock's input & coinbase checks by l0rinc)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

laanwj · 2025-05-09T12:49:57Z

Agree with the rationale of this PR, but having 1MB+ binary files in the repo is really meh.

l0rinc · 2025-05-09T12:57:35Z

Agree - do you have a better idea?

maflcko

Is there a benchmark that needs this? If yes, going for synthetic, but representative (and easily adjustable) data may be a better choice for that benchmark.

maflcko · 2025-05-09T09:40:19Z

src/bench/strencodings.cpp

 #include <span.h>
 #include <util/strencodings.h>

 #include <vector>

 static void HexStrBench(benchmark::Bench& bench)
 {
-    auto const& data = benchmark::data::block413567;
+    auto const& data = benchmark::data::block_784588;


Instead of a block, this could just be random bytes from a fast random context?

Yes, this one definitely, but in the other cases I'm worried about introducing a strong bias.
It's not like we're changing these very often - but I'll investigate anyway, let's see how close we can get without adding 1.5 Mb to the repo.

it's not just the amount of data, we're still scared from the xz backdoor incident 😄

Understandable, but that's why I added the hashes here, to make it self-validating.

Hahaha agree it would be extremely far-fetched to put data in a specific block, just to add it in the repository two years later.

Yes, this one definitely, but in the other cases I'm worried about introducing a strong bias.

Again, it would be good to list the benchmark that needs this. Also, serialization itself shouldn't care if the data is synthetic (random) or if it exactly matches a real past block. If you worry about a bias, it should actually be easier to provide synthetic data, than to try to find a fitting past block. In any case, there will always be a bias, even if the data is fully synthetic, as the real chain progresses and we probably don't want to update this for every release. For the benchmarks where it doesn't matter, I'd say to just leave them as-is. For the benchmarks where it matters, it would be good to explain why and then find a solution for each benchmark.

laanwj · 2025-05-09T14:19:49Z

Is there a benchmark that needs this? If yes, going for synthetic, but representative (and easily adjustable) data may be a better choice for that benchmark.

Yes, as there is a lot of random data in a block whose exact value isn't important to benchmarking (only that it's always the same), it seems possible to deterministically construct a similar block from code.

l0rinc · 2025-05-18T21:40:31Z

Added a random block generator in #32554 - let me know if it makes sense so I can close this one.

l0rinc · 2025-05-21T13:43:55Z

Closing in favor of #32554

l0rinc added 2 commits May 9, 2025 11:14

bench: document the measured block's properties

3878444

This commit documents the current benchmark-base block's properties, to highlight the differences with the replacement block in the next commit.

DrahtBot added the Tests label May 9, 2025

maflcko reviewed May 9, 2025

View reviewed changes

This was referenced May 9, 2025

[IBD] Tracking PR for speeding up Initial Block Download #32043

Draft

[IBD] specialize block serialization #31868

Draft

[IBD] specialize CheckBlock's input & coinbase checks #31682

Open

This was referenced May 16, 2025

Short-circuit GetLegacySigOpCount for known scripts #32532

Closed

bench: replace embedded raw block with configurable block generator #32554

Open

l0rinc closed this May 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bench: replace benchmark block with more representative one (413567 → 784588) #32457

bench: replace benchmark block with more representative one (413567 → 784588) #32457

Uh oh!

l0rinc commented May 9, 2025 •

edited

Loading

Uh oh!

DrahtBot commented May 9, 2025 •

edited

Loading

Uh oh!

laanwj commented May 9, 2025

Uh oh!

l0rinc commented May 9, 2025

Uh oh!

maflcko left a comment

Uh oh!

maflcko May 9, 2025

Uh oh!

l0rinc May 9, 2025

Uh oh!

laanwj May 9, 2025

Uh oh!

l0rinc May 9, 2025

Uh oh!

laanwj May 9, 2025

Uh oh!

maflcko May 13, 2025

Uh oh!

laanwj commented May 9, 2025

Uh oh!

l0rinc commented May 18, 2025

Uh oh!

l0rinc commented May 21, 2025

Uh oh!

Uh oh!

bench: replace benchmark block with more representative one (413567 → 784588) #32457

bench: replace benchmark block with more representative one (413567 → 784588) #32457

Uh oh!

Conversation

l0rinc commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Suggestion

Uh oh!

DrahtBot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

Uh oh!

laanwj commented May 9, 2025

Uh oh!

l0rinc commented May 9, 2025

Uh oh!

maflcko left a comment

Choose a reason for hiding this comment

Uh oh!

maflcko May 9, 2025

Choose a reason for hiding this comment

Uh oh!

l0rinc May 9, 2025

Choose a reason for hiding this comment

Uh oh!

laanwj May 9, 2025

Choose a reason for hiding this comment

Uh oh!

l0rinc May 9, 2025

Choose a reason for hiding this comment

Uh oh!

laanwj May 9, 2025

Choose a reason for hiding this comment

Uh oh!

maflcko May 13, 2025

Choose a reason for hiding this comment

Uh oh!

laanwj commented May 9, 2025

Uh oh!

l0rinc commented May 18, 2025

Uh oh!

l0rinc commented May 21, 2025

Uh oh!

Uh oh!

l0rinc commented May 9, 2025 •

edited

Loading

DrahtBot commented May 9, 2025 •

edited

Loading