Skip to content

Conversation

martinus
Copy link
Contributor

@martinus martinus commented Aug 6, 2023

TLDR:

  • Adds a TRACE_RAII macro to easily trace runtime of a code block.
  • Switch to CBufferedFile in BlockManager::ReadBlockFromDisk is slightly faster:
  • 9% faster unserialization => 1.2% faster -reindex-chainstate

While profiling -reindex-changestate I saw lots of fread() calls in in BlockManager::ReadBlockFromDisk. This replaces the use of CAutoFile with CBufferedFile with a small buffer, leading to much fewer calls to
fread(), which gives a little speedup.

I measured runtime of the synchronization with the TRACE_RAII macro. I ran this command which took about 30 minutes on my PC, with and without CBufferedFile:

sync && sudo /sbin/sysctl vm.drop_caches=3 && ~/git/github.com/martinus/bitcoin/src/bitcoind -dbcache=20000 -reindex-chainstate -printtoconsole=0 -stopatheight=500000

The measured time spent in unserializing blocks are:

  • 308.227s CAutoFile
  • 277.827s CBufferedFile

It is a bit hard to measure the total effect on -reindex-chainstate due to random fluctuations in the benchmark. For somewhat reliable results I've run the benchmark 10 times, using hyperfine:

hyperfine \                                                                                                                                                                                                                                                                                         --parameter-list commit 0f3a6a74a2889817df0f52e19c37de19c664daac,ce26cb6025c47bb4b3e51a579689635da7ca1f6b \
--setup 'git checkout {commit} && make -j$(nproc)' \
--prepare 'sync; sudo /sbin/sysctl vm.drop_caches=3' \
'./bitcoind -dbcache=20000 -reindex-chainstate -printtoconsole=0 -stopatheight=500000'

The results are:

Benchmark 1: ./bitcoind -dbcache=20000 -reindex-chainstate -printtoconsole=0 -stopatheight=500000 (CAutoFile)
  Time (mean ± σ):     1847.622 s ± 13.448 s    [User: 1702.768 s, System: 54.059 s]
  Range (min … max):   1835.319 s … 1877.065 s    10 runs

Benchmark 2: ./bitcoind -dbcache=20000 -reindex-chainstate -printtoconsole=0 -stopatheight=500000 (CBufferedFile)
  Time (mean ± σ):     1824.660 s ± 14.394 s    [User: 1679.417 s, System: 53.965 s]
  Range (min … max):   1798.764 s … 1848.940 s    10 runs

So the 9% improvement in unserializing seems to translate to roughly 1.2% total runtime improvement in -reindex-chainstate on my machine.

@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 6, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/28226.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #28195 (blockstorage: Drop legacy -txindex check by MarcoFalke)
  • #28052 (blockstorage: XOR blocksdir *.dat files by MarcoFalke)
  • #25541 (tracing: macOS USDT support by kouloumos)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

So that CAutoFile is replaceable with CBufferedFile
That way we can easily get reliable statistics about the serialization
code, without interference.
…File

While profiling `-reindex-changestate` I saw lots of fread() calls in
in ReadBlockFromDisk which can be slow. This replaces the use of CAutoFile
with CBufferedFile with a small buffer, leading to much fewer calls to
fread().

With the TRACE_RAII in place it is easier to reliably measure performance
of the critical part, just measuring `-reindex-chainstate` runtime can be
quite noisy.

Nevertheless, I ran this command which took about 30 minutes on my PC:

```
sync && sudo /sbin/sysctl vm.drop_caches=3 && ~/git/github.com/martinus/bitcoin/src/bitcoind -dbcache=20000 -reindex-chainstate -printtoconsole=0 -stopatheight=500000
```

The measured time spent in unserializing blocks are:
* 308.227s mergebase
* 277.827s this commit

This 9% improvement seems to translate into roughly 1.2% total runtime improvement,
when running the benchmark 10 times and showing the average runtime:

```
Benchmark 1: ./bitcoind -dbcache=20000 -reindex-chainstate -printtoconsole=0 -stopatheight=500000 (mergebase)
  Time (mean ± σ):     1847.622 s ± 13.448 s    [User: 1702.768 s, System: 54.059 s]
  Range (min … max):   1835.319 s … 1877.065 s    10 runs

Benchmark 2: ./bitcoind -dbcache=20000 -reindex-chainstate -printtoconsole=0 -stopatheight=500000 (branch)
  Time (mean ± σ):     1824.660 s ± 14.394 s    [User: 1679.417 s, System: 53.965 s]
  Range (min … max):   1798.764 s … 1848.940 s    10 runs
```

I have produced these results with `hyperfine`, with this command:

```
hyperfine \                                                                                                                                                                                                                                                                                          400 Mbps
--parameter-list commit 0f3a6a7,ce26cb6025c47bb4b3e51a579689635da7ca1f6b \
--setup 'git checkout {commit} && make -j$(nproc)' \
--prepare 'sync; sudo /sbin/sysctl vm.drop_caches=3' \
'./bitcoind -dbcache=20000 -reindex-chainstate -printtoconsole=0 -stopatheight=500000'
```
@martinus martinus force-pushed the 2023-08-more-CBufferedFile branch from 47c05d8 to e926795 Compare August 6, 2023 05:40
@maflcko
Copy link
Member

maflcko commented Aug 6, 2023

Shouldn't std::fread be already cached? I could imagine the result to be different on different hardware? Also, -reindex-chainstate seems rare enough to not spend time to optimize its runtime? Going further, if the only goal of CBufferedFile is to speed up -reindex by 1%, we could consider removing it. Since the implementation is put in the header file of streams.h, removing it could reduce peak compile memory usage by .5% and compilation speed by .3% on every fresh compilation IIRC. Obviously this is subjective, but to me it seems a better result.

@martinus
Copy link
Contributor Author

martinus commented Aug 6, 2023

I think most of the slowdown is due to the locking required by each fread() call. I did a quick benchmark by switching back to CAutoFile and and using fread_unlocked instead of std::fread (and call flockfile in the ctor and funlockfile before fclose), and the performance is practically the same as with CBufferedFile.

I don't mind dropping the change to CBufferedFile, but I think the TRACE_RAII macro is nice (maybe with a better name, I couldn't think of anything else...). It simplifies getting accurate measurements from parts of the code.

@maflcko
Copy link
Member

maflcko commented Aug 7, 2023

It may be better to modify CAutoFile in that case. The file handle in the auto-file should[1] always be unique and pinned to a single thread, so I using fread_unlocked seems better, as it improves performance everywhere. (Though, it looks like std::fread_unlocked does not exist in C++, so I wonder if it exists on Windows and macOS?)

[1] I may submit a related pull to better enforce this at compile-time.

@martinus martinus marked this pull request as draft August 7, 2023 18:07
@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 5, 2023

🐙 This pull request conflicts with the target branch and needs rebase.

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 4, 2023

There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@maflcko
Copy link
Member

maflcko commented Dec 4, 2023

I tried to remove BufferedFile, but it seemed non-trivial. I think it is needed to recover from datadir corruption caused by previous releases of Bitcoin Core. I wonder how relevant this still is, and whether it is simpler to require those affected to re-do an IBD. Another small benefit would be that the fuzz timeout in load_external_block_file would likely be fixed, see #28812 (comment). (Attempting to read a block at every byte position of the file in a loop may be CPU and memory intensive)

@maflcko maflcko mentioned this pull request Jan 5, 2024
@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 3, 2024

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 3, 2024

🤔 There hasn't been much activity lately and the CI seems to be failing.

If no one reviewed the current pull request by commit hash, a rebase can be considered. While the CI failure may be a false positive, the CI hasn't been running for some time, so there may be a real issue hiding as well. A rebase triggers the latest CI and makes sure that no silent merge conflicts have snuck in.

@DrahtBot
Copy link
Contributor

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

2 similar comments
@DrahtBot
Copy link
Contributor

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@DrahtBot
Copy link
Contributor

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@l0rinc
Copy link
Contributor

l0rinc commented Nov 25, 2024

@martinus, are you still working on this? If not, I'd like to take over, I've also noticed the same during profiling...

@martinus
Copy link
Contributor Author

@l0rinc I'm not working on it any more, feel free to take over!

@fanquake fanquake closed this Nov 25, 2024
l0rinc added a commit to l0rinc/bitcoin that referenced this pull request Nov 27, 2024
This commit revives bitcoin#28226:
> While profiling `-reindex-changestate` I saw lots of fread() calls in in ReadBlockFromDisk which can be slow.
> This replaces the use of `AutoFile` with `BufferedFile` with a small buffer, leading to much fewer calls to fread()."

Running `ReadBlockFromDiskTest` with varying buffer sizes show that `16 KiB` is more effective than the original 1 KiB - achieving a roughly 25% performance improvement compared to master.

Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com>
l0rinc added a commit to l0rinc/bitcoin that referenced this pull request Nov 27, 2024
This commit revives bitcoin#28226:
> While profiling `-reindex-changestate` I saw lots of fread() calls in in ReadBlockFromDisk which can be slow.
> This replaces the use of `AutoFile` with `BufferedFile` with a small buffer, leading to much fewer calls to fread()."

Running `ReadBlockFromDiskTest` with varying buffer sizes show that `16 KiB` is more effective than the original 1 KiB - achieving a roughly 25% performance improvement compared to master.

Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com>
@maflcko
Copy link
Member

maflcko commented Nov 27, 2024

It may be better to modify CAutoFile in that case. The file handle in the auto-file should[1] always be unique and pinned to a single thread, so I using fread_unlocked seems better, as it improves performance everywhere. (Though, it looks like std::fread_unlocked does not exist in C++, so I wonder if it exists on Windows and macOS?)

[1] I may submit a related pull to better enforce this at compile-time.

For reference, AutoFile::Get was removed in commit a240e15, so an fread_unlocked patch to AutoFile may be easier to review now, if someone implements it.

l0rinc added a commit to l0rinc/bitcoin that referenced this pull request Nov 27, 2024
This commit revives bitcoin#28226:
> While profiling `-reindex-changestate` I saw lots of fread() calls in in ReadBlockFromDisk which can be slow.
> This replaces the use of `AutoFile` with `BufferedFile` with a small buffer, leading to much fewer calls to fread()."

Running `ReadBlockFromDiskTest` with varying buffer sizes show that `16 KiB` is more effective than the original 1 KiB - achieving a roughly 25% performance improvement compared to master.

Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com>
l0rinc added a commit to l0rinc/bitcoin that referenced this pull request Nov 28, 2024
This commit revives bitcoin#28226:
> While profiling `-reindex-changestate` I saw lots of fread() calls in in ReadBlockFromDisk which can be slow.
> This replaces the use of `AutoFile` with `BufferedFile` with a small buffer, leading to much fewer calls to fread()."

Running `ReadBlockFromDiskTest` with varying buffer sizes show that `16 KiB` is more effective than the original 1 KiB - achieving a roughly 25% performance improvement compared to master.

Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com>
l0rinc added a commit to l0rinc/bitcoin that referenced this pull request Nov 28, 2024
This commit revives bitcoin#28226:
> While profiling `-reindex-changestate` I saw lots of fread() calls in in ReadBlockFromDisk which can be slow.
> This replaces the use of `AutoFile` with `BufferedFile` with a small buffer, leading to much fewer calls to fread()."

Running `ReadBlockFromDiskTest` with varying buffer sizes show that `16 KiB` is more effective than the original 1 KiB - achieving a roughly 25% performance improvement compared to master.

Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com>
l0rinc added a commit to l0rinc/bitcoin that referenced this pull request Nov 28, 2024
This commit revives bitcoin#28226:
> While profiling `-reindex-changestate` I saw lots of fread() calls in in ReadBlockFromDisk which can be slow.
> This replaces the use of `AutoFile` with `BufferedFile` with a small buffer, leading to much fewer calls to fread()."

Running `ReadBlockFromDiskTest` with varying buffer sizes show that `16 KiB` is more effective than the original 1 KiB - achieving a roughly 25% performance improvement compared to master.

Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com>
l0rinc added a commit to l0rinc/bitcoin that referenced this pull request Nov 28, 2024
This commit revives bitcoin#28226:
> While profiling `-reindex-changestate` I saw lots of fread() calls in in ReadBlockFromDisk which can be slow.
> This replaces the use of `AutoFile` with `BufferedFile` with a small buffer, leading to much fewer calls to fread()."

Running `ReadBlockFromDiskTest` with varying buffer sizes show that `16 KiB` is more effective than the original 1 KiB - achieving a roughly 25% performance improvement compared to master.

Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com>

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        6,107,470.52 |              163.73 |    0.0% |   65,474,032.49 |   21,525,453.92 |  3.042 |   3,986,348.13 |    0.3% |     10.56 | `ReadBlockFromDiskTest`
|          960,629.11 |            1,040.98 |    0.0% |    9,011,495.94 |    3,234,205.43 |  2.786 |   1,002,472.91 |    0.0% |     11.00 | `ReadRawBlockFromDiskTest`
l0rinc added a commit to l0rinc/bitcoin that referenced this pull request Nov 28, 2024
This commit revives bitcoin#28226:
> While profiling `-reindex-changestate` I saw lots of fread() calls in in ReadBlockFromDisk which can be slow.
> This replaces the use of `AutoFile` with `BufferedFile` with a small buffer, leading to much fewer calls to fread()."

Running `ReadBlockFromDiskTest` with varying buffer sizes show that `16 KiB` is more effective than the original 1 KiB - achieving a roughly 25% performance improvement compared to master.

Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com>

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        6,107,470.52 |              163.73 |    0.0% |   65,474,032.49 |   21,525,453.92 |  3.042 |   3,986,348.13 |    0.3% |     10.56 | `ReadBlockFromDiskTest`
l0rinc added a commit to l0rinc/bitcoin that referenced this pull request Nov 28, 2024
This commit revives bitcoin#28226:
> While profiling `-reindex-changestate` I saw lots of fread() calls in in ReadBlockFromDisk which can be slow.
> This replaces the use of `AutoFile` with `BufferedFile` with a small buffer, leading to much fewer calls to fread()."

Running `ReadBlockFromDiskTest` with varying buffer sizes show that `16 KiB` is more effective than the original 1 KiB - achieving a roughly 25% performance improvement compared to master.

Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com>

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        6,107,470.52 |              163.73 |    0.0% |   65,474,032.49 |   21,525,453.92 |  3.042 |   3,986,348.13 |    0.3% |     10.56 | `ReadBlockFromDiskTest`
l0rinc added a commit to l0rinc/bitcoin that referenced this pull request Nov 30, 2024
This commit revives bitcoin#28226:
> While profiling `-reindex-changestate` I saw lots of fread() calls in in ReadBlockFromDisk which can be slow.
> This replaces the use of `AutoFile` with `BufferedFile` with a small buffer, leading to much fewer calls to fread()."

Running `ReadBlockFromDiskTest` with varying buffer sizes show that `16 KiB` is more effective than the original 1 KiB - achieving a roughly 25% performance improvement compared to master.

Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com>

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        6,107,470.52 |              163.73 |    0.0% |   65,474,032.49 |   21,525,453.92 |  3.042 |   3,986,348.13 |    0.3% |     10.56 | `ReadBlockFromDiskTest`
achow101 added a commit that referenced this pull request Apr 16, 2025
…ization

8d801e3 optimization: bulk serialization writes in `WriteBlockUndo` and `WriteBlock` (Lőrinc)
520965e optimization: bulk serialization reads in `UndoRead`, `ReadBlock` (Lőrinc)
056cb3c refactor: clear up blockstorage/streams in preparation for optimization (Lőrinc)
67fcc64 log: unify error messages for (read/write)[undo]block (Lőrinc)
a4de160 scripted-diff: shorten BLOCK_SERIALIZATION_HEADER_SIZE constant (Lőrinc)
6640dd5 Narrow scope of undofile write to avoid possible resource management issue (Lőrinc)
3197155 refactor: collect block read operations into try block (Lőrinc)
c77e310 refactor: rename leftover WriteBlockBench (Lőrinc)

Pull request description:

  This change is part of [[IBD] - Tracking PR for speeding up Initial Block Download](#32043)

  ### Summary
  We can serialize the blocks and undos to any `Stream` which implements the appropriate read/write methods.
  `AutoFile` is one of these, writing the results "directly" to disk (through the OS file cache). Batching these in memory first and reading/writing these to disk is measurably faster (likely because of fewer native fread calls or less locking, as [observed](#28226 (comment)) by Martinus in a similar change).

  ### Unlocking new optimization opportunities

  Buffered writes will also enable batched obfuscation calculations (implemented in #31144) - especially since currently we need to copy the write input's std::span to do the obfuscation on it, and batching enables doing the operations on the internal buffer directly.

  ### Measurements (micro benchmarks, full IBDs and reindexes)

  Microbenchmarks for `[Read|Write]BlockBench` show a ~**30%**/**168%** speedup with `macOS/Clang`, and ~**19%**/**24%** with `Linux/GCC` (the follow-up XOR batching improves these further):

  <details>
  <summary>macOS Sequoia - Clang 19.1.7</summary>

  > Before:

  |               ns/op |                op/s |    err% |     total | benchmark
  |--------------------:|--------------------:|--------:|----------:|:----------
  |        2,271,441.67 |              440.25 |    0.1% |     11.00 | `ReadBlockBench`
  |        5,149,564.31 |              194.19 |    0.8% |     10.95 | `WriteBlockBench`

  > After:

  |               ns/op |                op/s |    err% |     total | benchmark
  |--------------------:|--------------------:|--------:|----------:|:----------
  |        1,738,683.04 |              575.15 |    0.2% |     11.04 | `ReadBlockBench`
  |        3,052,658.88 |              327.58 |    1.0% |     10.91 | `WriteBlockBench`

  </details>

  <details>
  <summary>Ubuntu 24 - GNU 13.3.0</summary>

  > Before:

  |               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
  |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
  |        6,895,987.11 |              145.01 |    0.0% |   71,055,269.86 |   23,977,374.37 |  2.963 |   5,074,828.78 |    0.4% |     22.00 | `ReadBlockBench`
  |        5,152,973.58 |              194.06 |    2.2% |   19,350,886.41 |    8,784,539.75 |  2.203 |   3,079,335.21 |    0.4% |     23.18 | `WriteBlockBench`

  > After:

  |               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
  |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
  |        5,771,882.71 |              173.25 |    0.0% |   65,741,889.82 |   20,453,232.33 |  3.214 |   3,971,321.75 |    0.3% |     22.01 | `ReadBlockBench`
  |        4,145,681.13 |              241.21 |    4.0% |   15,337,596.85 |    5,732,186.47 |  2.676 |   2,239,662.64 |    0.1% |     23.94 | `WriteBlockBench`

  </details>

  2 full IBD runs against master (compiled with GCC where the gains seem more modest) for **888888** blocks (seeded from real nodes) indicates a ~**7%** total speedup.

  <details>
  <summary>Details</summary>

  ```bash
  COMMITS="d2b72b13699cf460ffbcb1028bcf5f3b07d3b73a 652b4e3"; \
  STOP_HEIGHT=888888; DBCACHE=1000; \
  C_COMPILER=gcc; CXX_COMPILER=g++; \
  BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
  (for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done) && \
  hyperfine \
    --sort 'command' \
    --runs 2 \
    --export-json "$BASE_DIR/ibd-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$C_COMPILER.json" \
    --parameter-list COMMIT ${COMMITS// /,} \
    --prepare "killall bitcoind; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
      cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF -DCMAKE_C_COMPILER=$C_COMPILER -DCMAKE_CXX_COMPILER=$CXX_COMPILER && \
      cmake --build build -j$(nproc) --target bitcoind && \
      ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 100" \
    --cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
    "COMPILER=$C_COMPILER COMMIT=${COMMIT:0:10} ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -blocksonly -printtoconsole=0"
  d2b72b1 refactor: rename leftover WriteBlockBench
  652b4e3 optimization: Bulk serialization writes in `WriteBlockUndo` and `WriteBlock`
  Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = d2b72b1)
    Time (mean ± σ):     41528.104 s ± 354.003 s    [User: 44324.407 s, System: 3074.829 s]
    Range (min … max):   41277.786 s … 41778.421 s    2 runs

  Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 652b4e3)
    Time (mean ± σ):     38771.457 s ± 441.941 s    [User: 41930.651 s, System: 3222.664 s]
    Range (min … max):   38458.957 s … 39083.957 s    2 runs

  Relative speed comparison
          1.07 ±  0.02  COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = d2b72b1)
          1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 652b4e3)
  ```

  </details>

ACKs for top commit:
  maflcko:
    re-ACK 8d801e3 🐦
  achow101:
    ACK 8d801e3
  ryanofsky:
    Code review ACK 8d801e3. Most notable change is switching from BufferedReader to ReadRawBlock for block reads, which makes sense, and there are also various cleanups in blockstorage and test code.
  hodlinator:
    re-ACK 8d801e3

Tree-SHA512: 24e1dee653b927b760c0ba3c69d1aba15fa5d9c4536ad11cfc2d70196ae16b9228ecc3056eef70923364257d72dc929882e73e69c6c426e28139d31299d08adc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants