incorrect blk file size calculation during reindex results in recoverable blk file corruption #24858

mruddy · 2022-04-14T23:40:39Z

The blocks/blk?????.dat files are mutated and become increasingly malformed, or corrupt, as a result of running the re-indexing process.
The mutations occur after the re-indexing process has finished, as new blocks are appended, but are a result of a re-indexing process miscalculation that lingers in the block manager's m_blockfile_info nSize data until node restart.
These additions to the blk files are non-fatal, but also not desirable.
That is, this is a form of data corruption that the reading code is lenient enough to process (it skips the extra bytes), but it adds some scary looking log messages as it encounters them.

The summary of the problem is that the re-index process double counts the size of the serialization header (magic message start bytes [4 bytes] + length [4 bytes] = 8 bytes) while calculating the blk data file size (both values already account for the serialization header's size, hence why it is over accounted).

This bug manifests itself in a few different ways, after re-indexing, when a new block from a peer is processed:

If the new block will not fit into the last blk file processed while re-indexing, while remaining under the 128MiB limit, then the blk file is flushed to disk and truncated to a size that is 8 greater than it should be. The truncation adds zero bytes (see FlatFileSeq::Flush and TruncateFile).
If the last blk file processed while re-indexing has logical space for the new block under the 128 MiB limit:
1. If the blk file was not already large enough to hold the new block, then the zeros are, in effect, added by fseek when the file is opened for writing. Eight zero bytes are added to the end of the last blk file just before the new block is written. This happens because the write offset is 8 too great due to the miscalculation. The result is 8 zero bytes between the end of the last block and the beginning of the next block's magic + length + block.
2. If the blk file was already large enough to hold the new block, then the current existing file contents remain in the 8 byte gap between the end of the last block and the beginning of the next block's magic + length + block. Commonly, when this occcurs, it is due to the blk file containing blocks that are not connected to the block tree during reindex and are thus left behind by the reindex process and later overwritten when new blocks are added. The orphaned blocks can be valid blocks, but due to the nature of concurrent block download, the parent may not have been retrieved and written by the time the node was previously shutdown.

mruddy · 2022-04-15T02:16:09Z

An easy way to reproduce this issue is to:

setup a new temp node data directory containing only the blocks directory that contains only one 516 byte long file, blk00000.dat

the blk00000.dat file contains
1. the 8 byte serialization header for the genesis block
2. the genesis block (blockhash 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f)
3. the 8 byte serialization header for block at height 2
4. the block at height 2 (blockhash 000000006a625f06636b8bb6ac7b960a8d03705d1ace08b1a19da3fdcc99ddbd)

in other words:

$ cd /tmp/btc && find .
.
./blocks
./blocks/blk00000.dat
$ sha256sum blocks/blk00000.dat
713be71e35a2bc25481cc66b465da1e439df1c2570d065fa0aecc11eaecbea94  blocks/blk00000.dat
$ xxd -g 4 blocks/blk00000.dat
00000000: f9beb4d9 1d010000 01000000 00000000  ................
00000010: 00000000 00000000 00000000 00000000  ................
00000020: 00000000 00000000 00000000 3ba3edfd  ............;...
00000030: 7a7b12b2 7ac72c3e 67768f61 7fc81bc3  z{..z.,>gv.a....
00000040: 888a5132 3a9fb8aa 4b1e5e4a 29ab5f49  ..Q2:...K.^J)._I
00000050: ffff001d 1dac2b7c 01010000 00010000  ......+|........
00000060: 00000000 00000000 00000000 00000000  ................
00000070: 00000000 00000000 00000000 0000ffff  ................
00000080: ffff4d04 ffff001d 01044554 68652054  ..M.......EThe T
00000090: 696d6573 2030332f 4a616e2f 32303039  imes 03/Jan/2009
000000a0: 20436861 6e63656c 6c6f7220 6f6e2062   Chancellor on b
000000b0: 72696e6b 206f6620 7365636f 6e642062  rink of second b
000000c0: 61696c6f 75742066 6f722062 616e6b73  ailout for banks
000000d0: ffffffff 0100f205 2a010000 00434104  ........*....CA.
000000e0: 678afdb0 fe554827 1967f1a6 7130b710  g....UH'.g..q0..
000000f0: 5cd6a828 e03909a6 7962e0ea 1f61deb6  \..(.9..yb...a..
00000100: 49f6bc3f 4cef38c4 f35504e5 1ec112de  I..?L.8..U......
00000110: 5c384df7 ba0b8d57 8a4c702b 6bf11d5f  \8M....W.Lp+k.._
00000120: ac000000 00f9beb4 d9d70000 00010000  ................
00000130: 004860eb 18bf1b16 20e37e94 90fc8a42  .H`..... .~....B
00000140: 7514416f d75159ab 86688e9a 83000000  u.Ao.QY..h......
00000150: 00d5fdcc 541e25de 1c7a5add edf24858  ....T.%..zZ...HX
00000160: b8bb665c 9f36ef74 4ee42c31 6022c90f  ..f\.6.tN.,1`"..
00000170: 9bb0bc66 49ffff00 1d08d2bd 61010100  ...fI.......a...
00000180: 00000100 00000000 00000000 00000000  ................
00000190: 00000000 00000000 00000000 00000000  ................
000001a0: 000000ff ffffff07 04ffff00 1d010bff  ................
000001b0: ffffff01 00f2052a 01000000 43410472  .......*....CA.r
000001c0: 11a824f5 5b505228 e4c3d519 4c1fcfaa  ..$.[PR(....L...
000001d0: 15a456ab df37f9b9 d97a4040 afc073de  ..V..7...z@@..s.
000001e0: e6c89064 984f0338 5237d921 67c13e23  ...d.O.8R7.!g.>#
000001f0: 6446b417 ab79a0fc ae412ae3 316b77ac  dF...y...A*.1kw.
00000200: 00000000                             ....

start the node with the command line similar to: bitcoin-qt -debug -datadir=/tmp/btc -reindex
wait for header synchronization to finish and you see this in the debug log:
UpdatedBlockTip: new block hash=00000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb6048
shutdown the node

We setup like this in order to hit the case where an "Out of order block" is found during reindexing and it is left disconnected since it's ancestor block at height 1 (blockhash 00000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb6048) is not present in the blk00000.dat file for the reindexing process to attach.

Now, if we examine the blk00000.dat file, we can see that the corruption has occured at offset 0x125 through 0x12c. The serialization header and data for block at height 1 follow at offset 0x12d:

$ xxd -g 4 -l 524 blocks/blk00000.dat
00000000: f9beb4d9 1d010000 01000000 00000000  ................
00000010: 00000000 00000000 00000000 00000000  ................
00000020: 00000000 00000000 00000000 3ba3edfd  ............;...
00000030: 7a7b12b2 7ac72c3e 67768f61 7fc81bc3  z{..z.,>gv.a....
00000040: 888a5132 3a9fb8aa 4b1e5e4a 29ab5f49  ..Q2:...K.^J)._I
00000050: ffff001d 1dac2b7c 01010000 00010000  ......+|........
00000060: 00000000 00000000 00000000 00000000  ................
00000070: 00000000 00000000 00000000 0000ffff  ................
00000080: ffff4d04 ffff001d 01044554 68652054  ..M.......EThe T
00000090: 696d6573 2030332f 4a616e2f 32303039  imes 03/Jan/2009
000000a0: 20436861 6e63656c 6c6f7220 6f6e2062   Chancellor on b
000000b0: 72696e6b 206f6620 7365636f 6e642062  rink of second b
000000c0: 61696c6f 75742066 6f722062 616e6b73  ailout for banks
000000d0: ffffffff 0100f205 2a010000 00434104  ........*....CA.
000000e0: 678afdb0 fe554827 1967f1a6 7130b710  g....UH'.g..q0..
000000f0: 5cd6a828 e03909a6 7962e0ea 1f61deb6  \..(.9..yb...a..
00000100: 49f6bc3f 4cef38c4 f35504e5 1ec112de  I..?L.8..U......
00000110: 5c384df7 ba0b8d57 8a4c702b 6bf11d5f  \8M....W.Lp+k.._
00000120: ac000000 00f9beb4 d9d70000 00f9beb4  ................      <----- offset 0x125 through 0x12c demonstrate the incorrect offset calculated by the reindexing process. those bytes are from the original data that was already in the file and were erroneously not overwritten due to the bad offset.
00000130: d9d70000 00010000 006fe28c 0ab6f1b3  .........o......
00000140: 72c1a6a2 46ae63f7 4f931e83 65e15a08  r...F.c.O...e.Z.
00000150: 9c68d619 00000000 00982051 fd1e4ba7  .h........ Q..K.
00000160: 44bbbe68 0e1fee14 677ba1a3 c3540bf7  D..h....g{...T..
00000170: b1cdb606 e857233e 0e61bc66 49ffff00  .....W#>.a.fI...
00000180: 1d01e362 99010100 00000100 00000000  ...b............
00000190: 00000000 00000000 00000000 00000000  ................
000001a0: 00000000 00000000 000000ff ffffff07  ................
000001b0: 04ffff00 1d0104ff ffffff01 00f2052a  ...............*
000001c0: 01000000 43410496 b538e853 519c726a  ....CA...8.SQ.rj
000001d0: 2c91e61e c11600ae 1390813a 627c66fb  ,..........:b|f.
000001e0: 8be7947b e63c52da 75893795 15d4e0a6  ...{.<R.u.7.....
000001f0: 04f81417 81e62294 721166bf 621e73a8  ......".r.f.b.s.
00000200: 2cbf2342 c858eeac 00000000           ,.#B.X......

If we now restart the node with the same command line as before, bitcoin-qt -debug -datadir=/tmp/btc -reindex we will soon see a message in the debug.log like:
LoadExternalBlockFile: Deserialize or I/O error - ReadCompactSize(): size too large: iostream error

mruddy · 2022-04-15T09:39:03Z

Attached here is the blk00000.dat file with sha256sum of
713be71e35a2bc25481cc66b465da1e439df1c2570d065fa0aecc11eaecbea94 used by my easy reproduction comment above #24858 (comment).

Simply remove the ".txt" file extension that I had to add in order to upload it here.

blk00000.dat.txt

vincenzopalazzo

Concept ACK, sounds good to me, and it is also well documented.

I will test it in locally with your use case described

mruddy · 2022-04-18T12:54:05Z

noting for historical purposes that this updates a fix made by #5864 @theuni

jamesob · 2022-04-18T15:26:47Z

Concept ACK, will review this week.

mzumsande

Concept ACK

An alternative, extremely simple way to see that -reindex adds 8 bytes is is to just observe debug.log during startup: Here for regtest, with a datadir that has only the genesis block:
Master, before Reindex:
LoadBlockIndexDB: last block file info: CBlockFileInfo(blocks=1, size=293, heights=0...0, time=2011-02-02...2011-02-02)

Master, after Reindex:
LoadBlockIndexDB: last block file info: CBlockFileInfo(blocks=1, size=301, heights=0...0, time=2011-02-02...2011-02-02)

The size goes up by 8 bytes from 293 to 301, where as it stays at 293 with this branch.

src/node/blockstorage.cpp

src/test/blockmanager_tests.cpp

src/validation.cpp

w0xlt · 2022-04-19T03:40:18Z

Concept ACK, will review soon.

jamesob

ACK 66b8d3c (jamesob/ackr/24858.1.mruddy.reindex_log_test_incorre)

Tested locally. The source of the bug (reading through SaveBlockToDisk -> FindBlockPos) is still somewhat unclear to me... I guess I'm trying to determine where those 8 header bytes are accounted for in the case of an existing block during reindex. In other words, exactly what line is responsible for the addition of those 8 bytes to the FlatFilePos* dbp, which is passed into SaveBlockToDisk.

Near as I can figure, the answer to that question is somewhere in the LoadExternalBlockFile -> AcceptBlock call chain.

[after writing the above and rereading the code]

My best guess is that those 8 bytes are included in LoadExternalBlockFile (nRewind = blkdat.GetPos()) after an existing block is read (which then determines the value of dbp), whereas during the addition of a new block, no existing data is read and therefore those 8 header bytes are unaccounted for.

Anyway, I've tested this patch pretty extensively:

multiple subsequent -reindex calls with -stopatheight=105000
using this branch with an existing mainnet datadir, waiting for new tips to come in, switching back to master and ensuring init happens properly
revert change in validation.cpp, ensure included unittests fail

So I have good confidence that this is safe and correct.

Thanks for working on this; I think this is a pretty important issue to iron out even if it doesn't immediately cause true data corruption. It's the kind of thing that could easily cascade into a bigger problem.

Show signature data

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

ACK 66b8d3c23d999651c79ac03e7558d4248a200ba1 ([`jamesob/ackr/24858.1.mruddy.reindex_log_test_incorre`](https://github.com/jamesob/bitcoin/tree/ackr/24858.1.mruddy.reindex_log_test_incorre))

Tested locally. The source of the bug (reading through SaveBlockToDisk -> FindBlockPos) is still somewhat unclear to me... I guess I'm trying to determine where those 8 header bytes are accounted for in the case of an existing block during reindex. In other words, exactly what line is responsible for the addition of those 8 bytes to the `FlatFilePos* dbp`, which is passed into SaveBlockToDisk.

Near as I can figure, the answer to that question is somewhere in the `LoadExternalBlockFile -> AcceptBlock` call chain.

[after writing the above and rereading the code]

My best guess is that those 8 bytes are included in LoadExternalBlockFile (`nRewind = blkdat.GetPos()`) after an existing block is read (which then determines the value of `dbp`), whereas during the addition of a new block, no existing data is read and therefore those 8 header bytes are unaccounted for.

Anyway, I've tested this patch pretty extensively:
- - multiple subsequent -reindex calls with -stopatheight=105000
- - using this branch with an existing mainnet datadir, waiting for new tips to come in, switching back to master and ensuring init happens properly
- - revert change in validation.cpp, ensure included unittests fail

So I have good confidence that this is safe and correct.

Thanks for working on this; I think this is a pretty important issue to iron out even if it doesn't immediately cause true data corruption. It's the kind of thing that could easily cascade into a bigger problem.

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEGNRVI1NPYuZCSIrGepNdrbLETwUFAmJgK0AACgkQepNdrbLE
TwXXJA/8Cq4RgrVcKBTB7zYjgqM5d2aQo7R++nMyF+5ZW+fPHY9kn9uR3YXJgVHb
u68TsFFAY/CS3ziNLmBsjsOS4yOvDzdz5m9f6Eso78qsLyf7qix3gOlCkGZGuGy3
Sr8FHYrgV4AuqdOQMlUnLLyZAf9ItDpF/OOcMm61IVM5MgMfXlnntod9bPil+AAI
G6COVrL9P4/RRgs1oxRuVeP+RXRwwyNzJbavjThd8pnpd5JAFpcs8z9FGDmLlQaV
Z4tVJvwoFohYhK5AoqnX13A+lIGRZoHqEihd2P++3wrzA1DiLQHXv2+3umRX6sSl
ApjhFBf68rH006BTe0OOoyakYMHPPJCleD+1W730Tk5/NxiuBO6ES4B7Yw4lEI4f
ZBXtVDha+3vRnUFXSUzAWIetfgoSOny3EC7rst9FOtd+I0W4lzbKGj4+Zh1PpB45
rx9ygnPV3h7shkAbla/zJL3L7luAlBgQkdYoHCEOTeGtuvtQ16yHpDDFgbRHlzh/
1BeFr7/l03YM8i3rayB4UYUfuvEc2LuY4QOPYjyrgS0FZNY1z+lec+BWirul8Kz0
NsY25rHLiQYW70hh+5HcidFKs9LoiYfZ22eWuWpZTle07CClrV2xNGB36SO4AiuR
QComx/Z7uTFYdWD7tkwawgJzv/LHtxthEYpvacRJvro1Tj1rCQQ=
=R1e1
-----END PGP SIGNATURE-----

Show platform data

Tested on Linux-5.10.0-11-amd64-x86_64-with-glibc2.31

Configured with ./configure LDFLAGS=-L/home/james/src/bitcoin/db4/lib/ CPPFLAGS=-I/home/james/src/bitcoin/db4/include/ CXXFLAGS=-fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -Wsign-compare -Werror=thread-safety-analysis --enable-wallet --enable-debug --with-daemon --enable-natpmp-default

Compiled with /usr/bin/ccache /usr/local/bin/clang++ -std=c++17 -mavx -mavx2 -mpclmul -fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -Wsign-compare -Werror=thread-safety-analysis -O0 -g3 -ftrapv -fdebug-prefix-map=$(abs_top_srcdir)=.  -Wstack-protector -fstack-protector-all -fcf-protection=full -fstack-clash-protection -msse4.1 -msse4.2 -msse4 -msha  i

Compiler version: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63

src/node/blockstorage.cpp

src/validation.cpp

src/node/blockstorage.cpp

mruddy · 2022-04-20T16:54:39Z

My best guess is that those 8 bytes are included in LoadExternalBlockFile (nRewind = blkdat.GetPos()) after an existing block is read (which then determines the value of dbp), whereas during the addition of a new block, no existing data is read and therefore those 8 header bytes are unaccounted for.

@jamesob Thanks for the review! You're very close and you have the right idea. The exact line where they come from is in CChainState::LoadExternalBlockFile at the line dbp->nPos = nBlockPos; https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L4288
That is why blindly adding another 8 on the call to FindBlockPos in BlockManager::SaveBlockToDisk is wrong for already known blocks.

droid192 · 2022-05-01T15:52:17Z

please add error recovery to rescan such that from any ioerror the following blk.dat files can still be imported while/after the missing data is downloaded form a peer. Currently all considered blk.dat files after yield 0 imported blocks (btw one could stop after first ioerror right away!)
also: does it reverse already padded blk.dats that got touched by reindex ?
It reduces stress from network and oneself!

mruddy · 2022-05-02T14:23:33Z

Currently all considered blk.dat files after yield 0 imported blocks (btw one could stop after first ioerror right away!)

That can be the case, but it is not the case a lot of the time. For example, this issue deals with the cases where reads fail, and get skipped, in-between blocks. The case where no subsequent blocks get imported is where an actual block read fails and that makes every subsequent block appear to be out of order. That's a different thing.

does it reverse already padded blk.dats

No, this change does not attempt to alter the block files if an error is found (and skipped). This change attempts to stop setting up data that is used by later processing which winds up corrupting the block files.

I'd like to keep this change focused on fixing the original issue and not trying to re-design the import processes.

ryanofsky

Code review ACK 425fc88, but I think the current fix is more confusing and fragile than it needs to be. I would suggest something like ryanofsky@07c05a1 as a more straightforward fix that just passes new FindBlockPos calls the same position that the original FindBlockPos returned, instead of trying to work around the position being different by tweaking the size.

Overall though this is a nice improvement adding a useful test and dealing with a scary looking bug.

src/node/blockstorage.cpp

src/test/blockmanager_tests.cpp

mruddy · 2022-05-05T08:03:39Z

@ryanofsky Thanks for the review. I've put updates related to your comments into 011f6ec. I'll squash the commits after positive code review of these.

I still like what I have in SaveBlockToDisk regarding what is passed into FindBlockPos for the second parameter, nAddSize. When the position and size is known, we pass that in. When only size is known, we pass in block size + header size because we know that's what we'll have to add to the file. And, by doing it this way, we don't have to re-adjust the known position on the way out of SaveBlockToDisk. We will never pass the same position into FindBlockPos for a given block because we either don't know the position, or we pass in where we found the block. The only question is whether we should pass in the positon that FindBlockPos previously returned. I don't see the advantage to doing that because it's kind of like trying to pretend that we don't know what we do know.

jonatack · 2022-05-05T08:08:04Z

Concept ACK.

(The linter needs appeasing when you repush; consider presenting this pull in its intended final form to ease review.)

The subject line of commit hash 011f6ec54fffb7600ae300851cc9816423f3f4dd is followed by a non-empty line. Subject lines should always be followed by a blank line.

This fixes a blk file size calculation made during reindex that results in increased blk file malformity. The fix is to avoid double counting the size of the serialization header during reindex. This adds a unit test to reproduce the bug before the fix and to ensure that it does not recur. These changes include a log message change also so as to not be as alarming. This is a common and recoverable data corruption. These messages can now be filtered by the debug log reindex category.

mruddy · 2022-05-07T13:07:19Z

ok, i squashed the commits and rebased onto the latest master. bcb0cac is the latest
after working through the feedback. i tested and it still looks good.

mruddy · 2022-05-12T09:52:24Z

just to summarize, i think bcb0cac is the final form of this patch. it's a small, targeted, and uncomplicated bug fix. i think that if we can get some final reviews that this would be ready to merge. thanks!

mruddy · 2022-06-07T10:39:28Z

Is there anything that I can do to get this merged? It's ready to merge as far as I can tell.

ryanofsky

Code review ACK bcb0cac. This is a disturbing bug with an easy fix which seems well-worth merging.

I still like what I have in SaveBlockToDisk regarding what is passed into FindBlockPos for the second parameter, nAddSize. When the position and size is known, we pass that in. When only size is known, we pass in block size + header size because we know that's what we'll have to add to the file.

To me, this sounds more complicated that just passing the same size.

And, by doing it this way, we don't have to re-adjust the known position on the way out of SaveBlockToDisk.

This is inaccurate because WriteBlockToDisk is adjusting the position here if dbp is null. Instead of SaveBlockToDisk sometimes adjusting the position, and sometimes not adjusting it, I think it is clearer if SaveBlockToDisk always adjusts the position as in ryanofsky@07c05a1. Just making the adjustment in SaveBlockToDisk explicit at all is improvement. I'm pretty sure I stared at this code for an hour before I noticed the hidden adjustment in WriteBlockToDisk that was critical to making it work.

We will never pass the same position into FindBlockPos for a given block because we either don't know the position, or we pass in where we found the block. The only question is whether we should pass in the positon that FindBlockPos previously returned. I don't see the advantage to doing that because it's kind of like trying to pretend that we don't know what we do know.

I don't know why you are mentioning the position-not-known case because that is the working case, not the broken case, and no change we are discussing affects it. Also no code is "trying to pretend" anything in that case that I can see.

The advantage to passing the same position to FindBlockPos that it previously returned is that code is easier to explain and less likely to break if a variable has one meaning within a function, instead of two different meanings. This bug is directly caused by two different position values being used for the same block in FindBlockPos. You are working around two different position values being for being used for the same block by introducing two different size values for the same block. I think if we just used just one position and one size for a block in FindBlockPos, code would be easier to understand and less fragile.

ryanofsky · 2022-06-13T14:13:47Z

src/node/blockstorage.cpp

+        // when known, blockPos.nPos points at the offset of the block data in the blk file. that already accounts for
+        // the serialization header present in the file (the 4 magic message start bytes + the 4 length bytes = 8 bytes = BLOCK_SERIALIZATION_HEADER_SIZE).
+        // we add BLOCK_SERIALIZATION_HEADER_SIZE only for new blocks since they will have the serialization header added when written to disk.
+        nBlockSize += static_cast<unsigned int>(BLOCK_SERIALIZATION_HEADER_SIZE);


In commit "reindex, log, test: fixes #21379" (bcb0cac)

This comment doesn't address the confusing thing here because it doesn't say why nBlockSize is set to different sizes for the same block depending on position_known. Would suggest a comment that says why the code is using a different size in each case. Could use:

// When position_known is false, the CBlock has not been written to the // disk yet, so the nBlockSize value passed to FindBlockPos needs to be // set so FindBlockPos allocates enough bytes for the CBlock data *plus* // the size of the header WriteBlockToDisk will write before the CBlock // data. // // When position_known is true, WriteBlockToDisk will have already been // called, and it will have adjusted blockPos.nPos to point to the CBlock // data after the header, instead of to the header, so the nBlockSize // value passed to FindBlockPos should only include the CBlock size, // *not* the header size.

I still prefer the approach of passing a consistent size and position to FindBlockPos like ryanofsky@07c05a1 instead of dealing with an inconsistent position value by introducing an inconsistent size value. But both approaches have the same ultimate effect, so the more important thing is to document the code clearly.

First, thanks for the feedback. While I'm not opposed to adding these comments if they help get this merged, I'd rather let the commit stay as it is so that the reviews can accumulate and we can get this stuff merged.

I also think this comment would be helpful, except for the point below.

// When position_known is true, WriteBlockToDisk will have already been // called, and it will have adjusted blockPos.nPos to point to the CBlock // data after the header

Isn't this incorrect? My understanding is that when position_known is true, blockPos.nPos was set by the LoadExternalBlockFile code which has read past the header in the existing block file and moved the position accordingly - but not WriteBlockToDisk, which wasn't called for this block at all during this invocation of bitcoind.

re: #24858 (comment)

// When position_known is true, WriteBlockToDisk will have already been // called, and it will have adjusted blockPos.nPos to point to the CBlock // data after the header

Isn't this incorrect?

I think it is correct 😎. The line that changes blockPos.nPos to point after the header instead of before the header is here:

bitcoin/src/node/blockstorage.cpp

Line 676 in 9fb2a2b

pos.nPos = (unsigned int)fileOutPos;

This is the critical code calculating the block position which is saved in the CBlockIndex struct, and written to BlockIndexDb and used when ReadBlockFromDisk is called to read back block data.

My understanding is that when position_known is true, blockPos.nPos was set by the LoadExternalBlockFile code which has read past the header in the existing block file and moved the position accordingly

I don't think LoadExternalBlockFile is relevant to normal operation, only used when importing or reindexing is requested.

but not WriteBlockToDisk, which wasn't called for this block at all during this invocation of bitcoind.

It's true that WriteBlockToDisk may or may not have been called during this invocation of bitcoind, but it must have been called previously at some point, and the position it returns is written to BlockIndexDb and used to read blocks in future invocations.

Hmm, but the suggested comment is not in ReadBlockFromDisk but in SaveBlockToDisk, and it is about the case when "position_known is true", or dbp != nullptr. I think this is only possible when this function is called via AcceptBlock() from LoadExternalBlockFile which has set the blockPos - the normal operation calls to SaveBlockToDisk (via AcceptBlock() and ProcessNewBlock()) have dbp==nullptr, so position_known is false and the other half of the comment applies.

Sure, if you feel it is more useful for the comment to go into those things, feel free to make a suggestion along those lines. My suggestion was rejected anyway so maybe you will have better luck.

Also, I wrote this comment to describe a piece of code that I think is unnecessarily confusing. I would personally prefer not to have this piece of code, and not to have this comment explaining it, and instead use a solution like ryanofsky@07c05a1 which does not require understanding external places where SaveBlockToDisk is called to understand, and only requires understanding internally how it always returns the same position.

I don't have a strong preference - it seems that both approaches only affect the blockstorage-internal logic, and since neither solution changes the public interface SaveBlockToDisk, this sholdn't change what external places will need to know: In particular, that if they pass dbp!=nullptr, dbp needs to point right after the header and before the block data (which should be documented in SaveBlockToDisk imo).

However, I think that the internal function FindBlockPos() would be clearer if it was split up depending on whether the block is known, with so much code being conditional on it: If dbp!=nullptr , the function doesn't "find" anything new but only updates the block info statistics (and sometimes does a Flush which I'm not convinced is necessary). I had an initial go at this in https://github.com/mzumsande/bitcoin/commits/202207_refactor_findblockpos , which is outside the scope of this PR, but might make sense as a follow-up.

I agree with @ryanofsky's comment (above) about consistency. I'd also note that adding 8 to nBlockSize is confusing because it's no longer a block size!

mruddy · 2022-06-14T10:31:33Z

@ryanofsky Thanks for the code review ACK!

I appreciate your comments and I spent quite a while responding to them.
In the end, I decided not to post my responses in order to avoid confusion and focus on this being ready to merge.
I hear you though. I think we accomplish the same result just in a pretty minorly different way that comes down to personal preference.

mzumsande

ACK bcb0cac (reviewed code and did some testing, I agree that it fixes the bug).

nits below are non-blocking and could be left to follow-ups.

In general, I got the impression that the SaveBlockToDisk / FindBlockPos code could benefit from doc improvements and possibly refactors to improve the quality of code, even though it's obviously a critical area.

mzumsande · 2022-07-05T00:00:55Z

src/test/blockmanager_tests.cpp

+using node::BLOCK_SERIALIZATION_HEADER_SIZE;
+
+// use BasicTestingSetup here for the data directory configuration, setup, and cleanup
+BOOST_FIXTURE_TEST_SUITE(blockmanager_tests, BasicTestingSetup)


nit: With the convention "<source_filename>_tests.cpp", the name of the file/test suite should be blockstorage_tests.

mzumsande · 2022-07-05T03:43:58Z

src/validation.cpp

@@ -4378,7 +4378,18 @@ void CChainState::LoadExternalBlockFile(FILE* fileIn, FlatFilePos* dbp)
                    }
                }
            } catch (const std::exception& e) {
-                LogPrintf("%s: Deserialize or I/O error - %s\n", __func__, e.what());
+                // historical bugs added extra data to the block files that does not deserialize cleanly.


nit: I would drop the last two sentences (seems a bit verbose and out of place to explain what reindex is not and what studious node operators may do), and I find long comments like this one easier to read when full sentences start with capitals.

mzumsande · 2022-07-05T04:13:15Z

src/node/blockstorage.cpp

+        // when known, blockPos.nPos points at the offset of the block data in the blk file. that already accounts for
+        // the serialization header present in the file (the 4 magic message start bytes + the 4 length bytes = 8 bytes = BLOCK_SERIALIZATION_HEADER_SIZE).
+        // we add BLOCK_SERIALIZATION_HEADER_SIZE only for new blocks since they will have the serialization header added when written to disk.
+        nBlockSize += static_cast<unsigned int>(BLOCK_SERIALIZATION_HEADER_SIZE);


I also think this comment would be helpful, except for the point below.

// When position_known is true, WriteBlockToDisk will have already been // called, and it will have adjusted blockPos.nPos to point to the CBlock // data after the header

Isn't this incorrect? My understanding is that when position_known is true, blockPos.nPos was set by the LoadExternalBlockFile code which has read past the header in the existing block file and moved the position accordingly - but not WriteBlockToDisk, which wasn't called for this block at all during this invocation of bitcoind.

mzumsande · 2022-07-05T04:30:32Z

src/node/blockstorage.h

@@ -171,6 +174,7 @@ class BlockManager
    bool WriteUndoDataForBlock(const CBlockUndo& blockundo, BlockValidationState& state, CBlockIndex* pindex, const CChainParams& chainparams)
        EXCLUSIVE_LOCKS_REQUIRED(::cs_main);

+    /** Store block on disk. If dbp is not nullptr, then it provides the known position of the block within a block file on disk. */


nit: Just from reading this comment it's not clear to me why this function is even called when the block is already on disk - it apparently doesn't need to be stored in that case.

LarryRuane

I want to propose my own alternative approach, which I think is easier to understand: LarryRuane@e8b90d4. This version hides the 8 bytes adjustment within FindBlockPos, which is where it belongs. In my view, SaveBlockToDisk should not need to be aware of this detail.

LarryRuane · 2022-07-23T21:31:42Z

src/node/blockstorage.cpp

 FlatFilePos BlockManager::SaveBlockToDisk(const CBlock& block, int nHeight, CChain& active_chain, const CChainParams& chainparams, const FlatFilePos* dbp)
 {
    unsigned int nBlockSize = ::GetSerializeSize(block, CLIENT_VERSION);
    FlatFilePos blockPos;
-    if (dbp != nullptr) {
+    const auto position_known {dbp != nullptr};


nit, if you retouch (bcb0cac)

Suggested change

const auto position_known {dbp != nullptr};

const bool position_known{dbp != nullptr};

LarryRuane · 2022-07-27T01:38:22Z

src/node/blockstorage.cpp

+        // when known, blockPos.nPos points at the offset of the block data in the blk file. that already accounts for
+        // the serialization header present in the file (the 4 magic message start bytes + the 4 length bytes = 8 bytes = BLOCK_SERIALIZATION_HEADER_SIZE).
+        // we add BLOCK_SERIALIZATION_HEADER_SIZE only for new blocks since they will have the serialization header added when written to disk.
+        nBlockSize += static_cast<unsigned int>(BLOCK_SERIALIZATION_HEADER_SIZE);


I agree with @ryanofsky's comment (above) about consistency. I'd also note that adding 8 to nBlockSize is confusing because it's no longer a block size!

LarryRuane · 2022-07-27T03:53:19Z

src/test/blockmanager_tests.cpp

+    // the block is found at offset 8 because there is an 8 byte serialization header
+    // consisting of 4 magic bytes + 4 length bytes before each block in a well-formed blk file.
+    FlatFilePos pos{0, BLOCK_SERIALIZATION_HEADER_SIZE};
+    BOOST_CHECK_EQUAL(blockman.SaveBlockToDisk(params->GenesisBlock(), 0, chain, *params, &pos).nPos, BLOCK_SERIALIZATION_HEADER_SIZE);


This test is slightly unrealistic because it's simulating a normal (new) block add followed by a reindex. A reindex, if it occurs, always happens immediately after the node starts up, before any blocks have been added from peers. As a result of this, this (the second) test call to SaveBlockToDisk causes m_blockfile_info.nBlocks to increment to 2 here, which is strange, since there's only one block.

This doesn't cause any problems now, but some future change could cause this test to break in mysterious ways.

A way to fix this would be to allocate a new BlockManager (and probably CChain) for this call, to simulate restarting the node. (The third call to SaveBlockToDisk, below, can use the same ones as this one does, because following a reindex, the real node does go ahead and start adding new blocks.)

In case it's not clear, I updated that commit (containing other suggestions) to show what I mean.

hernanmarino

Tested ACK. Reproduced the issue on master and tested succesfully the fix proposed.

Tested on
Linux 5.15.0-41-generic x86_64 x86_64 x86_64 GNU/Linux
Compiled with
gcc 11.2.0

pablomartin4btc

Tested ACK.

gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)

Linux 5.11.0-43-generic #47~20.04.2-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux

w0xlt

tACK bcb0cac

Verified the error on master branch as described in #24858 (comment) and #24858 (review). This PR fixes it.

jamesob · 2022-08-04T19:57:41Z

I think this would be a good bug to fix, so it would be nice to see some progress here. @mruddy do you think you'll be able to address some of the outstanding feedback? I think it might be good to at least incorporate some of the clarifying commentary that @ryanofsky proposed.

This PR is a step in the right direction; the fix could possibly clearer, along the lines of what @ryanofsky or @LarryRuane have suggested. But in any case, one of these approaches should be committed to, reviewed, and merged (with credit to @mruddy in any case for proposing the conceptual solution).

Edit: to be clear, merging this PR as-is would be a big improvement over the current state of things. I think we're bikeshedding a little at this point, and a follow-up could be done if necessary.

DrahtBot · 2022-08-05T10:07:48Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

mruddy · 2022-08-10T19:50:30Z

If I have to change something, then I'd probably go back in the direction that @LarryRuane suggested with e8b90d4. I'd have to review and test those changes closer, but at a glance, they seem plausible. They are close to my original fix in that they keep the change within FindBlockPos. I changed to the current approach trying to be responsive to feedback in April. I'd probably avoid adding comments in any potential future commit as those seem to cause a relatively large amount of friction. The current commit has gathered review, testing, and works, so I hate to start the process all over again. But, if this fix isn't going to get merged for some reason, then let me know and I'll do it.

LarryRuane · 2022-08-13T01:31:13Z

tested code-review ACK bcb0cac
I strongly agree with @jamesob, this should get merged asap; any possible improvements can be done in a follow-up PR. My apologies for stirring the pot too much!

jamesob · 2022-10-12T14:35:43Z

I think this is ready for merge.

…ex results in recoverable blk file corruption bcb0cac reindex, log, test: fixes bitcoin#21379 (mruddy) Pull request description: Fixes bitcoin#21379. The blocks/blk?????.dat files are mutated and become increasingly malformed, or corrupt, as a result of running the re-indexing process. The mutations occur after the re-indexing process has finished, as new blocks are appended, but are a result of a re-indexing process miscalculation that lingers in the block manager's `m_blockfile_info` `nSize` data until node restart. These additions to the blk files are non-fatal, but also not desirable. That is, this is a form of data corruption that the reading code is lenient enough to process (it skips the extra bytes), but it adds some scary looking log messages as it encounters them. The summary of the problem is that the re-index process double counts the size of the serialization header (magic message start bytes [4 bytes] + length [4 bytes] = 8 bytes) while calculating the blk data file size (both values already account for the serialization header's size, hence why it is over accounted). This bug manifests itself in a few different ways, after re-indexing, when a new block from a peer is processed: 1. If the new block will not fit into the last blk file processed while re-indexing, while remaining under the 128MiB limit, then the blk file is flushed to disk and truncated to a size that is 8 greater than it should be. The truncation adds zero bytes (see `FlatFileSeq::Flush` and `TruncateFile`). 1. If the last blk file processed while re-indexing has logical space for the new block under the 128 MiB limit: 1. If the blk file was not already large enough to hold the new block, then the zeros are, in effect, added by `fseek` when the file is opened for writing. Eight zero bytes are added to the end of the last blk file just before the new block is written. This happens because the write offset is 8 too great due to the miscalculation. The result is 8 zero bytes between the end of the last block and the beginning of the next block's magic + length + block. 1. If the blk file was already large enough to hold the new block, then the current existing file contents remain in the 8 byte gap between the end of the last block and the beginning of the next block's magic + length + block. Commonly, when this occcurs, it is due to the blk file containing blocks that are not connected to the block tree during reindex and are thus left behind by the reindex process and later overwritten when new blocks are added. The orphaned blocks can be valid blocks, but due to the nature of concurrent block download, the parent may not have been retrieved and written by the time the node was previously shutdown. ACKs for top commit: LarryRuane: tested code-review ACK bcb0cac ryanofsky: Code review ACK bcb0cac. This is a disturbing bug with an easy fix which seems well-worth merging. mzumsande: ACK bcb0cac (reviewed code and did some testing, I agree that it fixes the bug). w0xlt: tACK bitcoin@bcb0cac Tree-SHA512: acc97927ea712916506772550451136b0f1e5404e92df24cc05e405bb09eb6fe7c3011af3dd34a7723c3db17fda657ae85fa314387e43833791e9169c0febe51

mruddy force-pushed the issue_21379 branch from 84580a7 to 2da9617 Compare April 15, 2022 00:05

DrahtBot added Block storage Build system Validation labels Apr 15, 2022

vincenzopalazzo reviewed Apr 15, 2022

View reviewed changes

This was referenced Apr 18, 2022

-reindex fails if blk0*.dat files read-only #2039

Closed

Sync does not continue after reindex until a block is inved #8239

Closed

Reindex initiated by Bitcoin-Qt leaves 293 byte zero-padded gap in blk00000.dat #5028

Closed

mzumsande reviewed Apr 18, 2022

View reviewed changes

src/node/blockstorage.cpp Outdated Show resolved Hide resolved

src/test/blockmanager_tests.cpp Outdated Show resolved Hide resolved

src/validation.cpp Show resolved Hide resolved

mruddy force-pushed the issue_21379 branch from 2da9617 to 66b8d3c Compare April 19, 2022 01:57

jamesob approved these changes Apr 20, 2022

View reviewed changes

src/node/blockstorage.cpp Show resolved Hide resolved

src/validation.cpp Show resolved Hide resolved

src/node/blockstorage.cpp Outdated Show resolved Hide resolved

src/node/blockstorage.cpp Outdated Show resolved Hide resolved

mruddy force-pushed the issue_21379 branch from 66b8d3c to 425fc88 Compare April 20, 2022 20:22

maflcko changed the title ~~reindex, log, test: incorrect blk file size calculation during reindex results in undesirable blk file malformedness~~ incorrect blk file size calculation during reindex results in undesirable blk file malformedness Apr 21, 2022

maflcko removed Build system Validation labels Apr 21, 2022

ryanofsky approved these changes May 3, 2022

View reviewed changes

src/node/blockstorage.cpp Outdated Show resolved Hide resolved

src/node/blockstorage.cpp Show resolved Hide resolved

src/node/blockstorage.cpp Outdated Show resolved Hide resolved

src/test/blockmanager_tests.cpp Outdated Show resolved Hide resolved

mruddy force-pushed the issue_21379 branch from 011f6ec to bcb0cac Compare May 7, 2022 11:58

mruddy changed the title ~~incorrect blk file size calculation during reindex results in undesirable blk file malformedness~~ incorrect blk file size calculation during reindex results in recoverable blk file corruption Jun 9, 2022

ryanofsky approved these changes Jun 13, 2022

View reviewed changes

mzumsande reviewed Jul 5, 2022

View reviewed changes

LarryRuane reviewed Jul 27, 2022

View reviewed changes

hernanmarino approved these changes Jul 28, 2022

View reviewed changes

pablomartin4btc approved these changes Jul 28, 2022

View reviewed changes

w0xlt approved these changes Jul 30, 2022

View reviewed changes

DrahtBot mentioned this pull request Aug 5, 2022

Remove almost all blockstorage globals #25781

Merged

glozow merged commit cc12b89 into bitcoin:master Oct 12, 2022

blangel-nydig mentioned this pull request Oct 12, 2022

Add link to PR with the fix NYDIG/bitcoin-task-bounty#10

Merged

mruddy deleted the issue_21379 branch October 14, 2022 12:56

mruddy mentioned this pull request Oct 14, 2022

complete: investigate-and-resolve-reindex-corruption NYDIG/bitcoin-task-bounty#28

Merged

bitcoin locked and limited conversation to collaborators Oct 14, 2023

	const auto position_known {dbp != nullptr};
	const bool position_known{dbp != nullptr};

incorrect blk file size calculation during reindex results in recoverable blk file corruption #24858

incorrect blk file size calculation during reindex results in recoverable blk file corruption #24858

Uh oh!

Conversation

mruddy commented Apr 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mruddy commented Apr 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mruddy commented Apr 15, 2022

Uh oh!

vincenzopalazzo left a comment

Choose a reason for hiding this comment

Uh oh!

mruddy commented Apr 18, 2022

Uh oh!

jamesob commented Apr 18, 2022

Uh oh!

mzumsande left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

w0xlt commented Apr 19, 2022

Uh oh!

jamesob left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mruddy commented Apr 20, 2022

Uh oh!

droid192 commented May 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mruddy commented May 2, 2022

Uh oh!

ryanofsky left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mruddy commented May 5, 2022

Uh oh!

jonatack commented May 5, 2022

Uh oh!

mruddy commented May 7, 2022

Uh oh!

mruddy commented May 12, 2022

Uh oh!

mruddy commented Jun 7, 2022

Uh oh!

ryanofsky left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mzumsande Jul 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanofsky Jul 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LarryRuane Jul 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

mruddy commented Apr 14, 2022 •

edited

Loading

mruddy commented Apr 15, 2022 •

edited

Loading

droid192 commented May 1, 2022 •

edited

Loading

mzumsande Jul 5, 2022 •

edited

Loading

ryanofsky Jul 5, 2022 •

edited

Loading

LarryRuane Jul 27, 2022 •

edited

Loading

mzumsande Jul 5, 2022 •

edited

Loading

mzumsande Jul 5, 2022 •

edited

Loading

LarryRuane Jul 27, 2022 •

edited

Loading

jamesob commented Aug 4, 2022 •

edited

Loading

DrahtBot commented Aug 5, 2022 •

edited

Loading