Skip to content

Conversation

mruddy
Copy link
Contributor

@mruddy mruddy commented Apr 14, 2022

Fixes #21379.

The blocks/blk?????.dat files are mutated and become increasingly malformed, or corrupt, as a result of running the re-indexing process.
The mutations occur after the re-indexing process has finished, as new blocks are appended, but are a result of a re-indexing process miscalculation that lingers in the block manager's m_blockfile_info nSize data until node restart.
These additions to the blk files are non-fatal, but also not desirable.
That is, this is a form of data corruption that the reading code is lenient enough to process (it skips the extra bytes), but it adds some scary looking log messages as it encounters them.

The summary of the problem is that the re-index process double counts the size of the serialization header (magic message start bytes [4 bytes] + length [4 bytes] = 8 bytes) while calculating the blk data file size (both values already account for the serialization header's size, hence why it is over accounted).

This bug manifests itself in a few different ways, after re-indexing, when a new block from a peer is processed:

  1. If the new block will not fit into the last blk file processed while re-indexing, while remaining under the 128MiB limit, then the blk file is flushed to disk and truncated to a size that is 8 greater than it should be. The truncation adds zero bytes (see FlatFileSeq::Flush and TruncateFile).
  2. If the last blk file processed while re-indexing has logical space for the new block under the 128 MiB limit:
    1. If the blk file was not already large enough to hold the new block, then the zeros are, in effect, added by fseek when the file is opened for writing. Eight zero bytes are added to the end of the last blk file just before the new block is written. This happens because the write offset is 8 too great due to the miscalculation. The result is 8 zero bytes between the end of the last block and the beginning of the next block's magic + length + block.
    2. If the blk file was already large enough to hold the new block, then the current existing file contents remain in the 8 byte gap between the end of the last block and the beginning of the next block's magic + length + block. Commonly, when this occcurs, it is due to the blk file containing blocks that are not connected to the block tree during reindex and are thus left behind by the reindex process and later overwritten when new blocks are added. The orphaned blocks can be valid blocks, but due to the nature of concurrent block download, the parent may not have been retrieved and written by the time the node was previously shutdown.

@mruddy
Copy link
Contributor Author

mruddy commented Apr 15, 2022

An easy way to reproduce this issue is to:

  1. setup a new temp node data directory containing only the blocks directory that contains only one 516 byte long file, blk00000.dat

    1. the blk00000.dat file contains
      1. the 8 byte serialization header for the genesis block
      2. the genesis block (blockhash 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f)
      3. the 8 byte serialization header for block at height 2
      4. the block at height 2 (blockhash 000000006a625f06636b8bb6ac7b960a8d03705d1ace08b1a19da3fdcc99ddbd)

    in other words:

    $ cd /tmp/btc && find .
    .
    ./blocks
    ./blocks/blk00000.dat
    $ sha256sum blocks/blk00000.dat
    713be71e35a2bc25481cc66b465da1e439df1c2570d065fa0aecc11eaecbea94  blocks/blk00000.dat
    $ xxd -g 4 blocks/blk00000.dat
    00000000: f9beb4d9 1d010000 01000000 00000000  ................
    00000010: 00000000 00000000 00000000 00000000  ................
    00000020: 00000000 00000000 00000000 3ba3edfd  ............;...
    00000030: 7a7b12b2 7ac72c3e 67768f61 7fc81bc3  z{..z.,>gv.a....
    00000040: 888a5132 3a9fb8aa 4b1e5e4a 29ab5f49  ..Q2:...K.^J)._I
    00000050: ffff001d 1dac2b7c 01010000 00010000  ......+|........
    00000060: 00000000 00000000 00000000 00000000  ................
    00000070: 00000000 00000000 00000000 0000ffff  ................
    00000080: ffff4d04 ffff001d 01044554 68652054  ..M.......EThe T
    00000090: 696d6573 2030332f 4a616e2f 32303039  imes 03/Jan/2009
    000000a0: 20436861 6e63656c 6c6f7220 6f6e2062   Chancellor on b
    000000b0: 72696e6b 206f6620 7365636f 6e642062  rink of second b
    000000c0: 61696c6f 75742066 6f722062 616e6b73  ailout for banks
    000000d0: ffffffff 0100f205 2a010000 00434104  ........*....CA.
    000000e0: 678afdb0 fe554827 1967f1a6 7130b710  g....UH'.g..q0..
    000000f0: 5cd6a828 e03909a6 7962e0ea 1f61deb6  \..(.9..yb...a..
    00000100: 49f6bc3f 4cef38c4 f35504e5 1ec112de  I..?L.8..U......
    00000110: 5c384df7 ba0b8d57 8a4c702b 6bf11d5f  \8M....W.Lp+k.._
    00000120: ac000000 00f9beb4 d9d70000 00010000  ................
    00000130: 004860eb 18bf1b16 20e37e94 90fc8a42  .H`..... .~....B
    00000140: 7514416f d75159ab 86688e9a 83000000  u.Ao.QY..h......
    00000150: 00d5fdcc 541e25de 1c7a5add edf24858  ....T.%..zZ...HX
    00000160: b8bb665c 9f36ef74 4ee42c31 6022c90f  ..f\.6.tN.,1`"..
    00000170: 9bb0bc66 49ffff00 1d08d2bd 61010100  ...fI.......a...
    00000180: 00000100 00000000 00000000 00000000  ................
    00000190: 00000000 00000000 00000000 00000000  ................
    000001a0: 000000ff ffffff07 04ffff00 1d010bff  ................
    000001b0: ffffff01 00f2052a 01000000 43410472  .......*....CA.r
    000001c0: 11a824f5 5b505228 e4c3d519 4c1fcfaa  ..$.[PR(....L...
    000001d0: 15a456ab df37f9b9 d97a4040 afc073de  ..V..7...z@@..s.
    000001e0: e6c89064 984f0338 5237d921 67c13e23  ...d.O.8R7.!g.>#
    000001f0: 6446b417 ab79a0fc ae412ae3 316b77ac  dF...y...A*.1kw.
    00000200: 00000000                             ....
    
  2. start the node with the command line similar to: bitcoin-qt -debug -datadir=/tmp/btc -reindex

  3. wait for header synchronization to finish and you see this in the debug log:
    UpdatedBlockTip: new block hash=00000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb6048

  4. shutdown the node

We setup like this in order to hit the case where an "Out of order block" is found during reindexing and it is left disconnected since it's ancestor block at height 1 (blockhash 00000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb6048) is not present in the blk00000.dat file for the reindexing process to attach.

Now, if we examine the blk00000.dat file, we can see that the corruption has occured at offset 0x125 through 0x12c. The serialization header and data for block at height 1 follow at offset 0x12d:

$ xxd -g 4 -l 524 blocks/blk00000.dat
00000000: f9beb4d9 1d010000 01000000 00000000  ................
00000010: 00000000 00000000 00000000 00000000  ................
00000020: 00000000 00000000 00000000 3ba3edfd  ............;...
00000030: 7a7b12b2 7ac72c3e 67768f61 7fc81bc3  z{..z.,>gv.a....
00000040: 888a5132 3a9fb8aa 4b1e5e4a 29ab5f49  ..Q2:...K.^J)._I
00000050: ffff001d 1dac2b7c 01010000 00010000  ......+|........
00000060: 00000000 00000000 00000000 00000000  ................
00000070: 00000000 00000000 00000000 0000ffff  ................
00000080: ffff4d04 ffff001d 01044554 68652054  ..M.......EThe T
00000090: 696d6573 2030332f 4a616e2f 32303039  imes 03/Jan/2009
000000a0: 20436861 6e63656c 6c6f7220 6f6e2062   Chancellor on b
000000b0: 72696e6b 206f6620 7365636f 6e642062  rink of second b
000000c0: 61696c6f 75742066 6f722062 616e6b73  ailout for banks
000000d0: ffffffff 0100f205 2a010000 00434104  ........*....CA.
000000e0: 678afdb0 fe554827 1967f1a6 7130b710  g....UH'.g..q0..
000000f0: 5cd6a828 e03909a6 7962e0ea 1f61deb6  \..(.9..yb...a..
00000100: 49f6bc3f 4cef38c4 f35504e5 1ec112de  I..?L.8..U......
00000110: 5c384df7 ba0b8d57 8a4c702b 6bf11d5f  \8M....W.Lp+k.._
00000120: ac000000 00f9beb4 d9d70000 00f9beb4  ................      <----- offset 0x125 through 0x12c demonstrate the incorrect offset calculated by the reindexing process. those bytes are from the original data that was already in the file and were erroneously not overwritten due to the bad offset.
00000130: d9d70000 00010000 006fe28c 0ab6f1b3  .........o......
00000140: 72c1a6a2 46ae63f7 4f931e83 65e15a08  r...F.c.O...e.Z.
00000150: 9c68d619 00000000 00982051 fd1e4ba7  .h........ Q..K.
00000160: 44bbbe68 0e1fee14 677ba1a3 c3540bf7  D..h....g{...T..
00000170: b1cdb606 e857233e 0e61bc66 49ffff00  .....W#>.a.fI...
00000180: 1d01e362 99010100 00000100 00000000  ...b............
00000190: 00000000 00000000 00000000 00000000  ................
000001a0: 00000000 00000000 000000ff ffffff07  ................
000001b0: 04ffff00 1d0104ff ffffff01 00f2052a  ...............*
000001c0: 01000000 43410496 b538e853 519c726a  ....CA...8.SQ.rj
000001d0: 2c91e61e c11600ae 1390813a 627c66fb  ,..........:b|f.
000001e0: 8be7947b e63c52da 75893795 15d4e0a6  ...{.<R.u.7.....
000001f0: 04f81417 81e62294 721166bf 621e73a8  ......".r.f.b.s.
00000200: 2cbf2342 c858eeac 00000000           ,.#B.X......
  1. If we now restart the node with the same command line as before, bitcoin-qt -debug -datadir=/tmp/btc -reindex we will soon see a message in the debug.log like:
    LoadExternalBlockFile: Deserialize or I/O error - ReadCompactSize(): size too large: iostream error

@mruddy
Copy link
Contributor Author

mruddy commented Apr 15, 2022

Attached here is the blk00000.dat file with sha256sum of
713be71e35a2bc25481cc66b465da1e439df1c2570d065fa0aecc11eaecbea94 used by my easy reproduction comment above #24858 (comment).

Simply remove the ".txt" file extension that I had to add in order to upload it here.

blk00000.dat.txt

Copy link

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, sounds good to me, and it is also well documented.

I will test it in locally with your use case described

@mruddy
Copy link
Contributor Author

mruddy commented Apr 18, 2022

noting for historical purposes that this updates a fix made by #5864 @theuni

@jamesob
Copy link
Contributor

jamesob commented Apr 18, 2022

Concept ACK, will review this week.

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

An alternative, extremely simple way to see that -reindex adds 8 bytes is is to just observe debug.log during startup: Here for regtest, with a datadir that has only the genesis block:
Master, before Reindex:
LoadBlockIndexDB: last block file info: CBlockFileInfo(blocks=1, size=293, heights=0...0, time=2011-02-02...2011-02-02)

Master, after Reindex:
LoadBlockIndexDB: last block file info: CBlockFileInfo(blocks=1, size=301, heights=0...0, time=2011-02-02...2011-02-02)

The size goes up by 8 bytes from 293 to 301, where as it stays at 293 with this branch.

@w0xlt
Copy link
Contributor

w0xlt commented Apr 19, 2022

Concept ACK, will review soon.

Copy link
Contributor

@jamesob jamesob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 66b8d3c (jamesob/ackr/24858.1.mruddy.reindex_log_test_incorre)

Tested locally. The source of the bug (reading through SaveBlockToDisk -> FindBlockPos) is still somewhat unclear to me... I guess I'm trying to determine where those 8 header bytes are accounted for in the case of an existing block during reindex. In other words, exactly what line is responsible for the addition of those 8 bytes to the FlatFilePos* dbp, which is passed into SaveBlockToDisk.

Near as I can figure, the answer to that question is somewhere in the LoadExternalBlockFile -> AcceptBlock call chain.

[after writing the above and rereading the code]

My best guess is that those 8 bytes are included in LoadExternalBlockFile (nRewind = blkdat.GetPos()) after an existing block is read (which then determines the value of dbp), whereas during the addition of a new block, no existing data is read and therefore those 8 header bytes are unaccounted for.

Anyway, I've tested this patch pretty extensively:

  • multiple subsequent -reindex calls with -stopatheight=105000
  • using this branch with an existing mainnet datadir, waiting for new tips to come in, switching back to master and ensuring init happens properly
  • revert change in validation.cpp, ensure included unittests fail

So I have good confidence that this is safe and correct.

Thanks for working on this; I think this is a pretty important issue to iron out even if it doesn't immediately cause true data corruption. It's the kind of thing that could easily cascade into a bigger problem.

Show signature data

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

ACK 66b8d3c23d999651c79ac03e7558d4248a200ba1 ([`jamesob/ackr/24858.1.mruddy.reindex_log_test_incorre`](https://github.com/jamesob/bitcoin/tree/ackr/24858.1.mruddy.reindex_log_test_incorre))

Tested locally. The source of the bug (reading through SaveBlockToDisk -> FindBlockPos) is still somewhat unclear to me... I guess I'm trying to determine where those 8 header bytes are accounted for in the case of an existing block during reindex. In other words, exactly what line is responsible for the addition of those 8 bytes to the `FlatFilePos* dbp`, which is passed into SaveBlockToDisk.

Near as I can figure, the answer to that question is somewhere in the `LoadExternalBlockFile -> AcceptBlock` call chain.

[after writing the above and rereading the code]

My best guess is that those 8 bytes are included in LoadExternalBlockFile (`nRewind = blkdat.GetPos()`) after an existing block is read (which then determines the value of `dbp`), whereas during the addition of a new block, no existing data is read and therefore those 8 header bytes are unaccounted for.

Anyway, I've tested this patch pretty extensively:
- - multiple subsequent -reindex calls with -stopatheight=105000
- - using this branch with an existing mainnet datadir, waiting for new tips to come in, switching back to master and ensuring init happens properly
- - revert change in validation.cpp, ensure included unittests fail

So I have good confidence that this is safe and correct.

Thanks for working on this; I think this is a pretty important issue to iron out even if it doesn't immediately cause true data corruption. It's the kind of thing that could easily cascade into a bigger problem.

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEGNRVI1NPYuZCSIrGepNdrbLETwUFAmJgK0AACgkQepNdrbLE
TwXXJA/8Cq4RgrVcKBTB7zYjgqM5d2aQo7R++nMyF+5ZW+fPHY9kn9uR3YXJgVHb
u68TsFFAY/CS3ziNLmBsjsOS4yOvDzdz5m9f6Eso78qsLyf7qix3gOlCkGZGuGy3
Sr8FHYrgV4AuqdOQMlUnLLyZAf9ItDpF/OOcMm61IVM5MgMfXlnntod9bPil+AAI
G6COVrL9P4/RRgs1oxRuVeP+RXRwwyNzJbavjThd8pnpd5JAFpcs8z9FGDmLlQaV
Z4tVJvwoFohYhK5AoqnX13A+lIGRZoHqEihd2P++3wrzA1DiLQHXv2+3umRX6sSl
ApjhFBf68rH006BTe0OOoyakYMHPPJCleD+1W730Tk5/NxiuBO6ES4B7Yw4lEI4f
ZBXtVDha+3vRnUFXSUzAWIetfgoSOny3EC7rst9FOtd+I0W4lzbKGj4+Zh1PpB45
rx9ygnPV3h7shkAbla/zJL3L7luAlBgQkdYoHCEOTeGtuvtQ16yHpDDFgbRHlzh/
1BeFr7/l03YM8i3rayB4UYUfuvEc2LuY4QOPYjyrgS0FZNY1z+lec+BWirul8Kz0
NsY25rHLiQYW70hh+5HcidFKs9LoiYfZ22eWuWpZTle07CClrV2xNGB36SO4AiuR
QComx/Z7uTFYdWD7tkwawgJzv/LHtxthEYpvacRJvro1Tj1rCQQ=
=R1e1
-----END PGP SIGNATURE-----

Show platform data

Tested on Linux-5.10.0-11-amd64-x86_64-with-glibc2.31

Configured with ./configure LDFLAGS=-L/home/james/src/bitcoin/db4/lib/ CPPFLAGS=-I/home/james/src/bitcoin/db4/include/ CXXFLAGS=-fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -Wsign-compare -Werror=thread-safety-analysis --enable-wallet --enable-debug --with-daemon --enable-natpmp-default

Compiled with /usr/bin/ccache /usr/local/bin/clang++ -std=c++17 -mavx -mavx2 -mpclmul -fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -Wsign-compare -Werror=thread-safety-analysis -O0 -g3 -ftrapv -fdebug-prefix-map=$(abs_top_srcdir)=.  -Wstack-protector -fstack-protector-all -fcf-protection=full -fstack-clash-protection -msse4.1 -msse4.2 -msse4 -msha  i

Compiler version: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63

@mruddy
Copy link
Contributor Author

mruddy commented Apr 20, 2022

My best guess is that those 8 bytes are included in LoadExternalBlockFile (nRewind = blkdat.GetPos()) after an existing block is read (which then determines the value of dbp), whereas during the addition of a new block, no existing data is read and therefore those 8 header bytes are unaccounted for.

@jamesob Thanks for the review! You're very close and you have the right idea. The exact line where they come from is in CChainState::LoadExternalBlockFile at the line dbp->nPos = nBlockPos; https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L4288
That is why blindly adding another 8 on the call to FindBlockPos in BlockManager::SaveBlockToDisk is wrong for already known blocks.

@maflcko maflcko changed the title reindex, log, test: incorrect blk file size calculation during reindex results in undesirable blk file malformedness incorrect blk file size calculation during reindex results in undesirable blk file malformedness Apr 21, 2022
@droid192
Copy link

droid192 commented May 1, 2022

please add error recovery to rescan such that from any ioerror the following blk.dat files can still be imported while/after the missing data is downloaded form a peer. Currently all considered blk.dat files after yield 0 imported blocks (btw one could stop after first ioerror right away!)
also: does it reverse already padded blk.dats that got touched by reindex ?
It reduces stress from network and oneself!

@mruddy
Copy link
Contributor Author

mruddy commented May 2, 2022

Currently all considered blk.dat files after yield 0 imported blocks (btw one could stop after first ioerror right away!)

That can be the case, but it is not the case a lot of the time. For example, this issue deals with the cases where reads fail, and get skipped, in-between blocks. The case where no subsequent blocks get imported is where an actual block read fails and that makes every subsequent block appear to be out of order. That's a different thing.

does it reverse already padded blk.dats

No, this change does not attempt to alter the block files if an error is found (and skipped). This change attempts to stop setting up data that is used by later processing which winds up corrupting the block files.

I'd like to keep this change focused on fixing the original issue and not trying to re-design the import processes.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 425fc88, but I think the current fix is more confusing and fragile than it needs to be. I would suggest something like ryanofsky@07c05a1 as a more straightforward fix that just passes new FindBlockPos calls the same position that the original FindBlockPos returned, instead of trying to work around the position being different by tweaking the size.

Overall though this is a nice improvement adding a useful test and dealing with a scary looking bug.

@mruddy
Copy link
Contributor Author

mruddy commented May 5, 2022

@ryanofsky Thanks for the review. I've put updates related to your comments into 011f6ec. I'll squash the commits after positive code review of these.

I still like what I have in SaveBlockToDisk regarding what is passed into FindBlockPos for the second parameter, nAddSize. When the position and size is known, we pass that in. When only size is known, we pass in block size + header size because we know that's what we'll have to add to the file. And, by doing it this way, we don't have to re-adjust the known position on the way out of SaveBlockToDisk. We will never pass the same position into FindBlockPos for a given block because we either don't know the position, or we pass in where we found the block. The only question is whether we should pass in the positon that FindBlockPos previously returned. I don't see the advantage to doing that because it's kind of like trying to pretend that we don't know what we do know.

@jonatack
Copy link
Member

jonatack commented May 5, 2022

Concept ACK.

(The linter needs appeasing when you repush; consider presenting this pull in its intended final form to ease review.)

The subject line of commit hash 011f6ec54fffb7600ae300851cc9816423f3f4dd is followed by a non-empty line. Subject lines should always be followed by a blank line.

This fixes a blk file size calculation made during reindex that results in increased blk file malformity.
The fix is to avoid double counting the size of the serialization header during reindex.
This adds a unit test to reproduce the bug before the fix and to ensure that it does not recur.
These changes include a log message change also so as to not be as alarming. This is a common and recoverable
data corruption. These messages can now be filtered by the debug log reindex category.
@mruddy
Copy link
Contributor Author

mruddy commented May 7, 2022

ok, i squashed the commits and rebased onto the latest master. bcb0cac is the latest
after working through the feedback. i tested and it still looks good.

@mruddy
Copy link
Contributor Author

mruddy commented May 12, 2022

just to summarize, i think bcb0cac is the final form of this patch. it's a small, targeted, and uncomplicated bug fix. i think that if we can get some final reviews that this would be ready to merge. thanks!

@mruddy
Copy link
Contributor Author

mruddy commented Jun 7, 2022

Is there anything that I can do to get this merged? It's ready to merge as far as I can tell.

@mruddy mruddy changed the title incorrect blk file size calculation during reindex results in undesirable blk file malformedness incorrect blk file size calculation during reindex results in recoverable blk file corruption Jun 9, 2022
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK bcb0cac. This is a disturbing bug with an easy fix which seems well-worth merging.

I still like what I have in SaveBlockToDisk regarding what is passed into FindBlockPos for the second parameter, nAddSize. When the position and size is known, we pass that in. When only size is known, we pass in block size + header size because we know that's what we'll have to add to the file.

To me, this sounds more complicated that just passing the same size.

And, by doing it this way, we don't have to re-adjust the known position on the way out of SaveBlockToDisk.

This is inaccurate because WriteBlockToDisk is adjusting the position here if dbp is null. Instead of SaveBlockToDisk sometimes adjusting the position, and sometimes not adjusting it, I think it is clearer if SaveBlockToDisk always adjusts the position as in ryanofsky@07c05a1. Just making the adjustment in SaveBlockToDisk explicit at all is improvement. I'm pretty sure I stared at this code for an hour before I noticed the hidden adjustment in WriteBlockToDisk that was critical to making it work.

We will never pass the same position into FindBlockPos for a given block because we either don't know the position, or we pass in where we found the block. The only question is whether we should pass in the positon that FindBlockPos previously returned. I don't see the advantage to doing that because it's kind of like trying to pretend that we don't know what we do know.

I don't know why you are mentioning the position-not-known case because that is the working case, not the broken case, and no change we are discussing affects it. Also no code is "trying to pretend" anything in that case that I can see.

The advantage to passing the same position to FindBlockPos that it previously returned is that code is easier to explain and less likely to break if a variable has one meaning within a function, instead of two different meanings. This bug is directly caused by two different position values being used for the same block in FindBlockPos. You are working around two different position values being for being used for the same block by introducing two different size values for the same block. I think if we just used just one position and one size for a block in FindBlockPos, code would be easier to understand and less fragile.

Comment on lines +799 to +802
// when known, blockPos.nPos points at the offset of the block data in the blk file. that already accounts for
// the serialization header present in the file (the 4 magic message start bytes + the 4 length bytes = 8 bytes = BLOCK_SERIALIZATION_HEADER_SIZE).
// we add BLOCK_SERIALIZATION_HEADER_SIZE only for new blocks since they will have the serialization header added when written to disk.
nBlockSize += static_cast<unsigned int>(BLOCK_SERIALIZATION_HEADER_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "reindex, log, test: fixes #21379" (bcb0cac)

This comment doesn't address the confusing thing here because it doesn't say why nBlockSize is set to different sizes for the same block depending on position_known. Would suggest a comment that says why the code is using a different size in each case. Could use:

// When position_known is false, the CBlock has not been written to the
// disk yet, so the nBlockSize value passed to FindBlockPos needs to be
// set so FindBlockPos allocates enough bytes for the CBlock data *plus*
// the size of the header WriteBlockToDisk will write before the CBlock
// data.
//
// When position_known is true, WriteBlockToDisk will have already been
// called, and it will have adjusted blockPos.nPos to point to the CBlock
// data after the header, instead of to the header, so the nBlockSize
// value passed to FindBlockPos should only include the CBlock size,
// *not* the header size.

I still prefer the approach of passing a consistent size and position to FindBlockPos like ryanofsky@07c05a1 instead of dealing with an inconsistent position value by introducing an inconsistent size value. But both approaches have the same ultimate effect, so the more important thing is to document the code clearly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, thanks for the feedback. While I'm not opposed to adding these comments if they help get this merged, I'd rather let the commit stay as it is so that the reviews can accumulate and we can get this stuff merged.

Copy link
Contributor

@mzumsande mzumsande Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this comment would be helpful, except for the point below.

// When position_known is true, WriteBlockToDisk will have already been
// called, and it will have adjusted blockPos.nPos to point to the CBlock
// data after the header

Isn't this incorrect? My understanding is that when position_known is true, blockPos.nPos was set by the LoadExternalBlockFile code which has read past the header in the existing block file and moved the position accordingly - but not WriteBlockToDisk, which wasn't called for this block at all during this invocation of bitcoind.

Copy link
Contributor

@ryanofsky ryanofsky Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #24858 (comment)

// When position_known is true, WriteBlockToDisk will have already been
// called, and it will have adjusted blockPos.nPos to point to the CBlock
// data after the header

Isn't this incorrect?

I think it is correct 😎. The line that changes blockPos.nPos to point after the header instead of before the header is here:

pos.nPos = (unsigned int)fileOutPos;

This is the critical code calculating the block position which is saved in the CBlockIndex struct, and written to BlockIndexDb and used when ReadBlockFromDisk is called to read back block data.

My understanding is that when position_known is true, blockPos.nPos was set by the LoadExternalBlockFile code which has read past the header in the existing block file and moved the position accordingly

I don't think LoadExternalBlockFile is relevant to normal operation, only used when importing or reindexing is requested.

but not WriteBlockToDisk, which wasn't called for this block at all during this invocation of bitcoind.

It's true that WriteBlockToDisk may or may not have been called during this invocation of bitcoind, but it must have been called previously at some point, and the position it returns is written to BlockIndexDb and used to read blocks in future invocations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but the suggested comment is not in ReadBlockFromDisk but in SaveBlockToDisk, and it is about the case when "position_known is true", or dbp != nullptr. I think this is only possible when this function is called via AcceptBlock() from LoadExternalBlockFile which has set the blockPos - the normal operation calls to SaveBlockToDisk (via AcceptBlock() and ProcessNewBlock()) have dbp==nullptr, so position_known is false and the other half of the comment applies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, if you feel it is more useful for the comment to go into those things, feel free to make a suggestion along those lines. My suggestion was rejected anyway so maybe you will have better luck.

Also, I wrote this comment to describe a piece of code that I think is unnecessarily confusing. I would personally prefer not to have this piece of code, and not to have this comment explaining it, and instead use a solution like ryanofsky@07c05a1 which does not require understanding external places where SaveBlockToDisk is called to understand, and only requires understanding internally how it always returns the same position.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference - it seems that both approaches only affect the blockstorage-internal logic, and since neither solution changes the public interface SaveBlockToDisk, this sholdn't change what external places will need to know: In particular, that if they pass dbp!=nullptr, dbp needs to point right after the header and before the block data (which should be documented in SaveBlockToDisk imo).

However, I think that the internal function FindBlockPos() would be clearer if it was split up depending on whether the block is known, with so much code being conditional on it: If dbp!=nullptr , the function doesn't "find" anything new but only updates the block info statistics (and sometimes does a Flush which I'm not convinced is necessary). I had an initial go at this in https://github.com/mzumsande/bitcoin/commits/202207_refactor_findblockpos , which is outside the scope of this PR, but might make sense as a follow-up.

Copy link
Contributor

@LarryRuane LarryRuane Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ryanofsky's comment (above) about consistency. I'd also note that adding 8 to nBlockSize is confusing because it's no longer a block size!

@mruddy
Copy link
Contributor Author

mruddy commented Jun 14, 2022

@ryanofsky Thanks for the code review ACK!

I appreciate your comments and I spent quite a while responding to them.
In the end, I decided not to post my responses in order to avoid confusion and focus on this being ready to merge.
I hear you though. I think we accomplish the same result just in a pretty minorly different way that comes down to personal preference.

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK bcb0cac (reviewed code and did some testing, I agree that it fixes the bug).

nits below are non-blocking and could be left to follow-ups.

In general, I got the impression that the SaveBlockToDisk / FindBlockPos code could benefit from doc improvements and possibly refactors to improve the quality of code, even though it's obviously a critical area.

using node::BLOCK_SERIALIZATION_HEADER_SIZE;

// use BasicTestingSetup here for the data directory configuration, setup, and cleanup
BOOST_FIXTURE_TEST_SUITE(blockmanager_tests, BasicTestingSetup)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: With the convention "<source_filename>_tests.cpp", the name of the file/test suite should be blockstorage_tests.

@@ -4378,7 +4378,18 @@ void CChainState::LoadExternalBlockFile(FILE* fileIn, FlatFilePos* dbp)
}
}
} catch (const std::exception& e) {
LogPrintf("%s: Deserialize or I/O error - %s\n", __func__, e.what());
// historical bugs added extra data to the block files that does not deserialize cleanly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would drop the last two sentences (seems a bit verbose and out of place to explain what reindex is not and what studious node operators may do), and I find long comments like this one easier to read when full sentences start with capitals.

Comment on lines +799 to +802
// when known, blockPos.nPos points at the offset of the block data in the blk file. that already accounts for
// the serialization header present in the file (the 4 magic message start bytes + the 4 length bytes = 8 bytes = BLOCK_SERIALIZATION_HEADER_SIZE).
// we add BLOCK_SERIALIZATION_HEADER_SIZE only for new blocks since they will have the serialization header added when written to disk.
nBlockSize += static_cast<unsigned int>(BLOCK_SERIALIZATION_HEADER_SIZE);
Copy link
Contributor

@mzumsande mzumsande Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this comment would be helpful, except for the point below.

// When position_known is true, WriteBlockToDisk will have already been
// called, and it will have adjusted blockPos.nPos to point to the CBlock
// data after the header

Isn't this incorrect? My understanding is that when position_known is true, blockPos.nPos was set by the LoadExternalBlockFile code which has read past the header in the existing block file and moved the position accordingly - but not WriteBlockToDisk, which wasn't called for this block at all during this invocation of bitcoind.

@@ -171,6 +174,7 @@ class BlockManager
bool WriteUndoDataForBlock(const CBlockUndo& blockundo, BlockValidationState& state, CBlockIndex* pindex, const CChainParams& chainparams)
EXCLUSIVE_LOCKS_REQUIRED(::cs_main);

/** Store block on disk. If dbp is not nullptr, then it provides the known position of the block within a block file on disk. */
Copy link
Contributor

@mzumsande mzumsande Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Just from reading this comment it's not clear to me why this function is even called when the block is already on disk - it apparently doesn't need to be stored in that case.

Copy link
Contributor

@LarryRuane LarryRuane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to propose my own alternative approach, which I think is easier to understand: LarryRuane@e8b90d4. This version hides the 8 bytes adjustment within FindBlockPos, which is where it belongs. In my view, SaveBlockToDisk should not need to be aware of this detail.

FlatFilePos BlockManager::SaveBlockToDisk(const CBlock& block, int nHeight, CChain& active_chain, const CChainParams& chainparams, const FlatFilePos* dbp)
{
unsigned int nBlockSize = ::GetSerializeSize(block, CLIENT_VERSION);
FlatFilePos blockPos;
if (dbp != nullptr) {
const auto position_known {dbp != nullptr};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, if you retouch (bcb0cac)

Suggested change
const auto position_known {dbp != nullptr};
const bool position_known{dbp != nullptr};

Comment on lines +799 to +802
// when known, blockPos.nPos points at the offset of the block data in the blk file. that already accounts for
// the serialization header present in the file (the 4 magic message start bytes + the 4 length bytes = 8 bytes = BLOCK_SERIALIZATION_HEADER_SIZE).
// we add BLOCK_SERIALIZATION_HEADER_SIZE only for new blocks since they will have the serialization header added when written to disk.
nBlockSize += static_cast<unsigned int>(BLOCK_SERIALIZATION_HEADER_SIZE);
Copy link
Contributor

@LarryRuane LarryRuane Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ryanofsky's comment (above) about consistency. I'd also note that adding 8 to nBlockSize is confusing because it's no longer a block size!

// the block is found at offset 8 because there is an 8 byte serialization header
// consisting of 4 magic bytes + 4 length bytes before each block in a well-formed blk file.
FlatFilePos pos{0, BLOCK_SERIALIZATION_HEADER_SIZE};
BOOST_CHECK_EQUAL(blockman.SaveBlockToDisk(params->GenesisBlock(), 0, chain, *params, &pos).nPos, BLOCK_SERIALIZATION_HEADER_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is slightly unrealistic because it's simulating a normal (new) block add followed by a reindex. A reindex, if it occurs, always happens immediately after the node starts up, before any blocks have been added from peers. As a result of this, this (the second) test call to SaveBlockToDisk causes m_blockfile_info.nBlocks to increment to 2 here, which is strange, since there's only one block.

This doesn't cause any problems now, but some future change could cause this test to break in mysterious ways.

A way to fix this would be to allocate a new BlockManager (and probably CChain) for this call, to simulate restarting the node. (The third call to SaveBlockToDisk, below, can use the same ones as this one does, because following a reindex, the real node does go ahead and start adding new blocks.)

In case it's not clear, I updated that commit (containing other suggestions) to show what I mean.

Copy link
Contributor

@hernanmarino hernanmarino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested ACK. Reproduced the issue on master and tested succesfully the fix proposed.

Tested on
Linux 5.15.0-41-generic x86_64 x86_64 x86_64 GNU/Linux
Compiled with
gcc 11.2.0

Copy link
Member

@pablomartin4btc pablomartin4btc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested ACK.

gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)

Linux 5.11.0-43-generic #47~20.04.2-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux

Copy link
Contributor

@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK bcb0cac

Verified the error on master branch as described in #24858 (comment) and #24858 (review). This PR fixes it.

@jamesob
Copy link
Contributor

jamesob commented Aug 4, 2022

I think this would be a good bug to fix, so it would be nice to see some progress here. @mruddy do you think you'll be able to address some of the outstanding feedback? I think it might be good to at least incorporate some of the clarifying commentary that @ryanofsky proposed.

This PR is a step in the right direction; the fix could possibly clearer, along the lines of what @ryanofsky or @LarryRuane have suggested. But in any case, one of these approaches should be committed to, reviewed, and merged (with credit to @mruddy in any case for proposing the conceptual solution).

Edit: to be clear, merging this PR as-is would be a big improvement over the current state of things. I think we're bikeshedding a little at this point, and a follow-up could be done if necessary.

@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 5, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

@mruddy
Copy link
Contributor Author

mruddy commented Aug 10, 2022

If I have to change something, then I'd probably go back in the direction that @LarryRuane suggested with e8b90d4. I'd have to review and test those changes closer, but at a glance, they seem plausible. They are close to my original fix in that they keep the change within FindBlockPos. I changed to the current approach trying to be responsive to feedback in April. I'd probably avoid adding comments in any potential future commit as those seem to cause a relatively large amount of friction. The current commit has gathered review, testing, and works, so I hate to start the process all over again. But, if this fix isn't going to get merged for some reason, then let me know and I'll do it.

@LarryRuane
Copy link
Contributor

tested code-review ACK bcb0cac
I strongly agree with @jamesob, this should get merged asap; any possible improvements can be done in a follow-up PR. My apologies for stirring the pot too much!

@jamesob
Copy link
Contributor

jamesob commented Oct 12, 2022

I think this is ready for merge.

@glozow glozow merged commit cc12b89 into bitcoin:master Oct 12, 2022
@mruddy mruddy deleted the issue_21379 branch October 14, 2022 12:56
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Oct 23, 2022
…ex results in recoverable blk file corruption

bcb0cac reindex, log, test: fixes bitcoin#21379 (mruddy)

Pull request description:

  Fixes bitcoin#21379.

  The blocks/blk?????.dat files are mutated and become increasingly malformed, or corrupt, as a result of running the re-indexing process.
  The mutations occur after the re-indexing process has finished, as new blocks are appended, but are a result of a re-indexing process miscalculation that lingers in the block manager's `m_blockfile_info` `nSize` data until node restart.
  These additions to the blk files are non-fatal, but also not desirable.
  That is, this is a form of data corruption that the reading code is lenient enough to process (it skips the extra bytes), but it adds some scary looking log messages as it encounters them.

  The summary of the problem is that the re-index process double counts the size of the serialization header (magic message start bytes [4 bytes] + length [4 bytes] = 8 bytes) while calculating the blk data file size (both values already account for the serialization header's size, hence why it is over accounted).

  This bug manifests itself in a few different ways, after re-indexing, when a new block from a peer is processed:
  1. If the new block will not fit into the last blk file processed while re-indexing, while remaining under the 128MiB limit, then the blk file is flushed to disk and truncated to a size that is 8 greater than it should be. The truncation adds zero bytes (see `FlatFileSeq::Flush` and `TruncateFile`).
  1. If the last blk file processed while re-indexing has logical space for the new block under the 128 MiB limit:
      1. If the blk file was not already large enough to hold the new block, then the zeros are, in effect, added by `fseek` when the file is opened for writing. Eight zero bytes are added to the end of the last blk file just before the new block is written. This happens because the write offset is 8 too great due to the miscalculation. The result is 8 zero bytes between the end of the last block and the beginning of the next block's magic + length + block.
      1. If the blk file was already large enough to hold the new block, then the current existing file contents remain in the 8 byte gap between the end of the last block and the beginning of the next block's magic + length + block. Commonly, when this occcurs, it is due to the blk file containing blocks that are not connected to the block tree during reindex and are thus left behind by the reindex process and later overwritten when new blocks are added. The orphaned blocks can be valid blocks, but due to the nature of concurrent block download, the parent may not have been retrieved and written by the time the node was previously shutdown.

ACKs for top commit:
  LarryRuane:
    tested code-review ACK bcb0cac
  ryanofsky:
    Code review ACK bcb0cac. This is a disturbing bug with an easy fix which seems well-worth merging.
  mzumsande:
    ACK bcb0cac (reviewed code and did some testing, I agree that it fixes the bug).
  w0xlt:
    tACK bitcoin@bcb0cac

Tree-SHA512: acc97927ea712916506772550451136b0f1e5404e92df24cc05e405bb09eb6fe7c3011af3dd34a7723c3db17fda657ae85fa314387e43833791e9169c0febe51
@bitcoin bitcoin locked and limited conversation to collaborators Oct 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reindex subtly corrupts blknnnnn.dat files and causes subsequent deserialization errors