-
Notifications
You must be signed in to change notification settings - Fork 37.7k
incorrect blk file size calculation during reindex results in recoverable blk file corruption #24858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
An easy way to reproduce this issue is to:
We setup like this in order to hit the case where an "Out of order block" is found during reindexing and it is left disconnected since it's ancestor block at height 1 (blockhash 00000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb6048) is not present in the blk00000.dat file for the reindexing process to attach. Now, if we examine the blk00000.dat file, we can see that the corruption has occured at offset 0x125 through 0x12c. The serialization header and data for block at height 1 follow at offset 0x12d:
|
Attached here is the blk00000.dat file with sha256sum of Simply remove the ".txt" file extension that I had to add in order to upload it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, sounds good to me, and it is also well documented.
I will test it in locally with your use case described
Concept ACK, will review this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
An alternative, extremely simple way to see that -reindex
adds 8 bytes is is to just observe debug.log
during startup: Here for regtest, with a datadir that has only the genesis block:
Master, before Reindex:
LoadBlockIndexDB: last block file info: CBlockFileInfo(blocks=1, size=293, heights=0...0, time=2011-02-02...2011-02-02)
Master, after Reindex:
LoadBlockIndexDB: last block file info: CBlockFileInfo(blocks=1, size=301, heights=0...0, time=2011-02-02...2011-02-02)
The size goes up by 8 bytes from 293 to 301, where as it stays at 293 with this branch.
Concept ACK, will review soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 66b8d3c (jamesob/ackr/24858.1.mruddy.reindex_log_test_incorre
)
Tested locally. The source of the bug (reading through SaveBlockToDisk -> FindBlockPos) is still somewhat unclear to me... I guess I'm trying to determine where those 8 header bytes are accounted for in the case of an existing block during reindex. In other words, exactly what line is responsible for the addition of those 8 bytes to the FlatFilePos* dbp
, which is passed into SaveBlockToDisk.
Near as I can figure, the answer to that question is somewhere in the LoadExternalBlockFile -> AcceptBlock
call chain.
[after writing the above and rereading the code]
My best guess is that those 8 bytes are included in LoadExternalBlockFile (nRewind = blkdat.GetPos()
) after an existing block is read (which then determines the value of dbp
), whereas during the addition of a new block, no existing data is read and therefore those 8 header bytes are unaccounted for.
Anyway, I've tested this patch pretty extensively:
- multiple subsequent -reindex calls with -stopatheight=105000
- using this branch with an existing mainnet datadir, waiting for new tips to come in, switching back to master and ensuring init happens properly
- revert change in validation.cpp, ensure included unittests fail
So I have good confidence that this is safe and correct.
Thanks for working on this; I think this is a pretty important issue to iron out even if it doesn't immediately cause true data corruption. It's the kind of thing that could easily cascade into a bigger problem.
Show signature data
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
ACK 66b8d3c23d999651c79ac03e7558d4248a200ba1 ([`jamesob/ackr/24858.1.mruddy.reindex_log_test_incorre`](https://github.com/jamesob/bitcoin/tree/ackr/24858.1.mruddy.reindex_log_test_incorre))
Tested locally. The source of the bug (reading through SaveBlockToDisk -> FindBlockPos) is still somewhat unclear to me... I guess I'm trying to determine where those 8 header bytes are accounted for in the case of an existing block during reindex. In other words, exactly what line is responsible for the addition of those 8 bytes to the `FlatFilePos* dbp`, which is passed into SaveBlockToDisk.
Near as I can figure, the answer to that question is somewhere in the `LoadExternalBlockFile -> AcceptBlock` call chain.
[after writing the above and rereading the code]
My best guess is that those 8 bytes are included in LoadExternalBlockFile (`nRewind = blkdat.GetPos()`) after an existing block is read (which then determines the value of `dbp`), whereas during the addition of a new block, no existing data is read and therefore those 8 header bytes are unaccounted for.
Anyway, I've tested this patch pretty extensively:
- - multiple subsequent -reindex calls with -stopatheight=105000
- - using this branch with an existing mainnet datadir, waiting for new tips to come in, switching back to master and ensuring init happens properly
- - revert change in validation.cpp, ensure included unittests fail
So I have good confidence that this is safe and correct.
Thanks for working on this; I think this is a pretty important issue to iron out even if it doesn't immediately cause true data corruption. It's the kind of thing that could easily cascade into a bigger problem.
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEEGNRVI1NPYuZCSIrGepNdrbLETwUFAmJgK0AACgkQepNdrbLE
TwXXJA/8Cq4RgrVcKBTB7zYjgqM5d2aQo7R++nMyF+5ZW+fPHY9kn9uR3YXJgVHb
u68TsFFAY/CS3ziNLmBsjsOS4yOvDzdz5m9f6Eso78qsLyf7qix3gOlCkGZGuGy3
Sr8FHYrgV4AuqdOQMlUnLLyZAf9ItDpF/OOcMm61IVM5MgMfXlnntod9bPil+AAI
G6COVrL9P4/RRgs1oxRuVeP+RXRwwyNzJbavjThd8pnpd5JAFpcs8z9FGDmLlQaV
Z4tVJvwoFohYhK5AoqnX13A+lIGRZoHqEihd2P++3wrzA1DiLQHXv2+3umRX6sSl
ApjhFBf68rH006BTe0OOoyakYMHPPJCleD+1W730Tk5/NxiuBO6ES4B7Yw4lEI4f
ZBXtVDha+3vRnUFXSUzAWIetfgoSOny3EC7rst9FOtd+I0W4lzbKGj4+Zh1PpB45
rx9ygnPV3h7shkAbla/zJL3L7luAlBgQkdYoHCEOTeGtuvtQ16yHpDDFgbRHlzh/
1BeFr7/l03YM8i3rayB4UYUfuvEc2LuY4QOPYjyrgS0FZNY1z+lec+BWirul8Kz0
NsY25rHLiQYW70hh+5HcidFKs9LoiYfZ22eWuWpZTle07CClrV2xNGB36SO4AiuR
QComx/Z7uTFYdWD7tkwawgJzv/LHtxthEYpvacRJvro1Tj1rCQQ=
=R1e1
-----END PGP SIGNATURE-----
Show platform data
Tested on Linux-5.10.0-11-amd64-x86_64-with-glibc2.31
Configured with ./configure LDFLAGS=-L/home/james/src/bitcoin/db4/lib/ CPPFLAGS=-I/home/james/src/bitcoin/db4/include/ CXXFLAGS=-fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -Wsign-compare -Werror=thread-safety-analysis --enable-wallet --enable-debug --with-daemon --enable-natpmp-default
Compiled with /usr/bin/ccache /usr/local/bin/clang++ -std=c++17 -mavx -mavx2 -mpclmul -fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -Wsign-compare -Werror=thread-safety-analysis -O0 -g3 -ftrapv -fdebug-prefix-map=$(abs_top_srcdir)=. -Wstack-protector -fstack-protector-all -fcf-protection=full -fstack-clash-protection -msse4.1 -msse4.2 -msse4 -msha i
Compiler version: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63
@jamesob Thanks for the review! You're very close and you have the right idea. The exact line where they come from is in |
please add error recovery to |
That can be the case, but it is not the case a lot of the time. For example, this issue deals with the cases where reads fail, and get skipped, in-between blocks. The case where no subsequent blocks get imported is where an actual block read fails and that makes every subsequent block appear to be out of order. That's a different thing.
No, this change does not attempt to alter the block files if an error is found (and skipped). This change attempts to stop setting up data that is used by later processing which winds up corrupting the block files. I'd like to keep this change focused on fixing the original issue and not trying to re-design the import processes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 425fc88, but I think the current fix is more confusing and fragile than it needs to be. I would suggest something like ryanofsky@07c05a1 as a more straightforward fix that just passes new FindBlockPos
calls the same position that the original FindBlockPos
returned, instead of trying to work around the position being different by tweaking the size.
Overall though this is a nice improvement adding a useful test and dealing with a scary looking bug.
@ryanofsky Thanks for the review. I've put updates related to your comments into 011f6ec. I'll squash the commits after positive code review of these. I still like what I have in |
Concept ACK. (The linter needs appeasing when you repush; consider presenting this pull in its intended final form to ease review.)
|
This fixes a blk file size calculation made during reindex that results in increased blk file malformity. The fix is to avoid double counting the size of the serialization header during reindex. This adds a unit test to reproduce the bug before the fix and to ensure that it does not recur. These changes include a log message change also so as to not be as alarming. This is a common and recoverable data corruption. These messages can now be filtered by the debug log reindex category.
ok, i squashed the commits and rebased onto the latest master. bcb0cac is the latest |
just to summarize, i think bcb0cac is the final form of this patch. it's a small, targeted, and uncomplicated bug fix. i think that if we can get some final reviews that this would be ready to merge. thanks! |
Is there anything that I can do to get this merged? It's ready to merge as far as I can tell. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK bcb0cac. This is a disturbing bug with an easy fix which seems well-worth merging.
I still like what I have in
SaveBlockToDisk
regarding what is passed intoFindBlockPos
for the second parameter,nAddSize
. When the position and size is known, we pass that in. When only size is known, we pass in block size + header size because we know that's what we'll have to add to the file.
To me, this sounds more complicated that just passing the same size.
And, by doing it this way, we don't have to re-adjust the known position on the way out of
SaveBlockToDisk
.
This is inaccurate because WriteBlockToDisk is adjusting the position here if dbp is null. Instead of SaveBlockToDisk sometimes adjusting the position, and sometimes not adjusting it, I think it is clearer if SaveBlockToDisk always adjusts the position as in ryanofsky@07c05a1. Just making the adjustment in SaveBlockToDisk explicit at all is improvement. I'm pretty sure I stared at this code for an hour before I noticed the hidden adjustment in WriteBlockToDisk that was critical to making it work.
We will never pass the same position into
FindBlockPos
for a given block because we either don't know the position, or we pass in where we found the block. The only question is whether we should pass in the positon thatFindBlockPos
previously returned. I don't see the advantage to doing that because it's kind of like trying to pretend that we don't know what we do know.
I don't know why you are mentioning the position-not-known case because that is the working case, not the broken case, and no change we are discussing affects it. Also no code is "trying to pretend" anything in that case that I can see.
The advantage to passing the same position to FindBlockPos that it previously returned is that code is easier to explain and less likely to break if a variable has one meaning within a function, instead of two different meanings. This bug is directly caused by two different position values being used for the same block in FindBlockPos. You are working around two different position values being for being used for the same block by introducing two different size values for the same block. I think if we just used just one position and one size for a block in FindBlockPos, code would be easier to understand and less fragile.
// when known, blockPos.nPos points at the offset of the block data in the blk file. that already accounts for | ||
// the serialization header present in the file (the 4 magic message start bytes + the 4 length bytes = 8 bytes = BLOCK_SERIALIZATION_HEADER_SIZE). | ||
// we add BLOCK_SERIALIZATION_HEADER_SIZE only for new blocks since they will have the serialization header added when written to disk. | ||
nBlockSize += static_cast<unsigned int>(BLOCK_SERIALIZATION_HEADER_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "reindex, log, test: fixes #21379" (bcb0cac)
This comment doesn't address the confusing thing here because it doesn't say why nBlockSize is set to different sizes for the same block depending on position_known. Would suggest a comment that says why the code is using a different size in each case. Could use:
// When position_known is false, the CBlock has not been written to the
// disk yet, so the nBlockSize value passed to FindBlockPos needs to be
// set so FindBlockPos allocates enough bytes for the CBlock data *plus*
// the size of the header WriteBlockToDisk will write before the CBlock
// data.
//
// When position_known is true, WriteBlockToDisk will have already been
// called, and it will have adjusted blockPos.nPos to point to the CBlock
// data after the header, instead of to the header, so the nBlockSize
// value passed to FindBlockPos should only include the CBlock size,
// *not* the header size.
I still prefer the approach of passing a consistent size and position to FindBlockPos like ryanofsky@07c05a1 instead of dealing with an inconsistent position value by introducing an inconsistent size value. But both approaches have the same ultimate effect, so the more important thing is to document the code clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, thanks for the feedback. While I'm not opposed to adding these comments if they help get this merged, I'd rather let the commit stay as it is so that the reviews can accumulate and we can get this stuff merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this comment would be helpful, except for the point below.
// When position_known is true, WriteBlockToDisk will have already been // called, and it will have adjusted blockPos.nPos to point to the CBlock // data after the header
Isn't this incorrect? My understanding is that when position_known
is true, blockPos.nPos
was set by the LoadExternalBlockFile
code which has read past the header in the existing block file and moved the position accordingly - but not WriteBlockToDisk
, which wasn't called for this block at all during this invocation of bitcoind
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #24858 (comment)
// When position_known is true, WriteBlockToDisk will have already been // called, and it will have adjusted blockPos.nPos to point to the CBlock // data after the headerIsn't this incorrect?
I think it is correct 😎. The line that changes blockPos.nPos
to point after the header instead of before the header is here:
bitcoin/src/node/blockstorage.cpp
Line 676 in 9fb2a2b
pos.nPos = (unsigned int)fileOutPos; |
This is the critical code calculating the block position which is saved in the CBlockIndex
struct, and written to BlockIndexDb and used when ReadBlockFromDisk
is called to read back block data.
My understanding is that when
position_known
is true,blockPos.nPos
was set by theLoadExternalBlockFile
code which has read past the header in the existing block file and moved the position accordingly
I don't think LoadExternalBlockFile
is relevant to normal operation, only used when importing or reindexing is requested.
but not
WriteBlockToDisk
, which wasn't called for this block at all during this invocation ofbitcoind
.
It's true that WriteBlockToDisk
may or may not have been called during this invocation of bitcoind
, but it must have been called previously at some point, and the position it returns is written to BlockIndexDb and used to read blocks in future invocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, but the suggested comment is not in ReadBlockFromDisk
but in SaveBlockToDisk
, and it is about the case when "position_known is true", or dbp != nullptr
. I think this is only possible when this function is called via AcceptBlock()
from LoadExternalBlockFile
which has set the blockPos - the normal operation calls to SaveBlockToDisk
(via AcceptBlock()
and ProcessNewBlock()
) have dbp==nullptr
, so position_known
is false and the other half of the comment applies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, if you feel it is more useful for the comment to go into those things, feel free to make a suggestion along those lines. My suggestion was rejected anyway so maybe you will have better luck.
Also, I wrote this comment to describe a piece of code that I think is unnecessarily confusing. I would personally prefer not to have this piece of code, and not to have this comment explaining it, and instead use a solution like ryanofsky@07c05a1 which does not require understanding external places where SaveBlockToDisk
is called to understand, and only requires understanding internally how it always returns the same position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong preference - it seems that both approaches only affect the blockstorage-internal logic, and since neither solution changes the public interface SaveBlockToDisk
, this sholdn't change what external places will need to know: In particular, that if they pass dbp!=nullptr
, dbp needs to point right after the header and before the block data (which should be documented in SaveBlockToDisk
imo).
However, I think that the internal function FindBlockPos()
would be clearer if it was split up depending on whether the block is known, with so much code being conditional on it: If dbp!=nullptr
, the function doesn't "find" anything new but only updates the block info statistics (and sometimes does a Flush which I'm not convinced is necessary). I had an initial go at this in https://github.com/mzumsande/bitcoin/commits/202207_refactor_findblockpos , which is outside the scope of this PR, but might make sense as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @ryanofsky's comment (above) about consistency. I'd also note that adding 8 to nBlockSize
is confusing because it's no longer a block size!
@ryanofsky Thanks for the code review ACK! I appreciate your comments and I spent quite a while responding to them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK bcb0cac (reviewed code and did some testing, I agree that it fixes the bug).
nits below are non-blocking and could be left to follow-ups.
In general, I got the impression that the SaveBlockToDisk
/ FindBlockPos
code could benefit from doc improvements and possibly refactors to improve the quality of code, even though it's obviously a critical area.
using node::BLOCK_SERIALIZATION_HEADER_SIZE; | ||
|
||
// use BasicTestingSetup here for the data directory configuration, setup, and cleanup | ||
BOOST_FIXTURE_TEST_SUITE(blockmanager_tests, BasicTestingSetup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: With the convention "<source_filename>_tests.cpp
", the name of the file/test suite should be blockstorage_tests
.
@@ -4378,7 +4378,18 @@ void CChainState::LoadExternalBlockFile(FILE* fileIn, FlatFilePos* dbp) | |||
} | |||
} | |||
} catch (const std::exception& e) { | |||
LogPrintf("%s: Deserialize or I/O error - %s\n", __func__, e.what()); | |||
// historical bugs added extra data to the block files that does not deserialize cleanly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would drop the last two sentences (seems a bit verbose and out of place to explain what reindex is not and what studious node operators may do), and I find long comments like this one easier to read when full sentences start with capitals.
// when known, blockPos.nPos points at the offset of the block data in the blk file. that already accounts for | ||
// the serialization header present in the file (the 4 magic message start bytes + the 4 length bytes = 8 bytes = BLOCK_SERIALIZATION_HEADER_SIZE). | ||
// we add BLOCK_SERIALIZATION_HEADER_SIZE only for new blocks since they will have the serialization header added when written to disk. | ||
nBlockSize += static_cast<unsigned int>(BLOCK_SERIALIZATION_HEADER_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this comment would be helpful, except for the point below.
// When position_known is true, WriteBlockToDisk will have already been // called, and it will have adjusted blockPos.nPos to point to the CBlock // data after the header
Isn't this incorrect? My understanding is that when position_known
is true, blockPos.nPos
was set by the LoadExternalBlockFile
code which has read past the header in the existing block file and moved the position accordingly - but not WriteBlockToDisk
, which wasn't called for this block at all during this invocation of bitcoind
.
@@ -171,6 +174,7 @@ class BlockManager | |||
bool WriteUndoDataForBlock(const CBlockUndo& blockundo, BlockValidationState& state, CBlockIndex* pindex, const CChainParams& chainparams) | |||
EXCLUSIVE_LOCKS_REQUIRED(::cs_main); | |||
|
|||
/** Store block on disk. If dbp is not nullptr, then it provides the known position of the block within a block file on disk. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Just from reading this comment it's not clear to me why this function is even called when the block is already on disk - it apparently doesn't need to be stored in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to propose my own alternative approach, which I think is easier to understand: LarryRuane@e8b90d4. This version hides the 8 bytes adjustment within FindBlockPos
, which is where it belongs. In my view, SaveBlockToDisk
should not need to be aware of this detail.
FlatFilePos BlockManager::SaveBlockToDisk(const CBlock& block, int nHeight, CChain& active_chain, const CChainParams& chainparams, const FlatFilePos* dbp) | ||
{ | ||
unsigned int nBlockSize = ::GetSerializeSize(block, CLIENT_VERSION); | ||
FlatFilePos blockPos; | ||
if (dbp != nullptr) { | ||
const auto position_known {dbp != nullptr}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, if you retouch (bcb0cac)
const auto position_known {dbp != nullptr}; | |
const bool position_known{dbp != nullptr}; |
// when known, blockPos.nPos points at the offset of the block data in the blk file. that already accounts for | ||
// the serialization header present in the file (the 4 magic message start bytes + the 4 length bytes = 8 bytes = BLOCK_SERIALIZATION_HEADER_SIZE). | ||
// we add BLOCK_SERIALIZATION_HEADER_SIZE only for new blocks since they will have the serialization header added when written to disk. | ||
nBlockSize += static_cast<unsigned int>(BLOCK_SERIALIZATION_HEADER_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @ryanofsky's comment (above) about consistency. I'd also note that adding 8 to nBlockSize
is confusing because it's no longer a block size!
// the block is found at offset 8 because there is an 8 byte serialization header | ||
// consisting of 4 magic bytes + 4 length bytes before each block in a well-formed blk file. | ||
FlatFilePos pos{0, BLOCK_SERIALIZATION_HEADER_SIZE}; | ||
BOOST_CHECK_EQUAL(blockman.SaveBlockToDisk(params->GenesisBlock(), 0, chain, *params, &pos).nPos, BLOCK_SERIALIZATION_HEADER_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is slightly unrealistic because it's simulating a normal (new) block add followed by a reindex. A reindex, if it occurs, always happens immediately after the node starts up, before any blocks have been added from peers. As a result of this, this (the second) test call to SaveBlockToDisk
causes m_blockfile_info.nBlocks
to increment to 2 here, which is strange, since there's only one block.
This doesn't cause any problems now, but some future change could cause this test to break in mysterious ways.
A way to fix this would be to allocate a new BlockManager
(and probably CChain
) for this call, to simulate restarting the node. (The third call to SaveBlockToDisk
, below, can use the same ones as this one does, because following a reindex, the real node does go ahead and start adding new blocks.)
In case it's not clear, I updated that commit (containing other suggestions) to show what I mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested ACK. Reproduced the issue on master and tested succesfully the fix proposed.
Tested on
Linux 5.15.0-41-generic x86_64 x86_64 x86_64 GNU/Linux
Compiled with
gcc 11.2.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested ACK.
gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
Linux 5.11.0-43-generic #47~20.04.2-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tACK bcb0cac
Verified the error on master branch as described in #24858 (comment) and #24858 (review). This PR fixes it.
I think this would be a good bug to fix, so it would be nice to see some progress here. @mruddy do you think you'll be able to address some of the outstanding feedback? I think it might be good to at least incorporate some of the clarifying commentary that @ryanofsky proposed. This PR is a step in the right direction; the fix could possibly clearer, along the lines of what @ryanofsky or @LarryRuane have suggested. But in any case, one of these approaches should be committed to, reviewed, and merged (with credit to @mruddy in any case for proposing the conceptual solution). Edit: to be clear, merging this PR as-is would be a big improvement over the current state of things. I think we're bikeshedding a little at this point, and a follow-up could be done if necessary. |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsNo conflicts as of last run. |
If I have to change something, then I'd probably go back in the direction that @LarryRuane suggested with e8b90d4. I'd have to review and test those changes closer, but at a glance, they seem plausible. They are close to my original fix in that they keep the change within |
I think this is ready for merge. |
…ex results in recoverable blk file corruption bcb0cac reindex, log, test: fixes bitcoin#21379 (mruddy) Pull request description: Fixes bitcoin#21379. The blocks/blk?????.dat files are mutated and become increasingly malformed, or corrupt, as a result of running the re-indexing process. The mutations occur after the re-indexing process has finished, as new blocks are appended, but are a result of a re-indexing process miscalculation that lingers in the block manager's `m_blockfile_info` `nSize` data until node restart. These additions to the blk files are non-fatal, but also not desirable. That is, this is a form of data corruption that the reading code is lenient enough to process (it skips the extra bytes), but it adds some scary looking log messages as it encounters them. The summary of the problem is that the re-index process double counts the size of the serialization header (magic message start bytes [4 bytes] + length [4 bytes] = 8 bytes) while calculating the blk data file size (both values already account for the serialization header's size, hence why it is over accounted). This bug manifests itself in a few different ways, after re-indexing, when a new block from a peer is processed: 1. If the new block will not fit into the last blk file processed while re-indexing, while remaining under the 128MiB limit, then the blk file is flushed to disk and truncated to a size that is 8 greater than it should be. The truncation adds zero bytes (see `FlatFileSeq::Flush` and `TruncateFile`). 1. If the last blk file processed while re-indexing has logical space for the new block under the 128 MiB limit: 1. If the blk file was not already large enough to hold the new block, then the zeros are, in effect, added by `fseek` when the file is opened for writing. Eight zero bytes are added to the end of the last blk file just before the new block is written. This happens because the write offset is 8 too great due to the miscalculation. The result is 8 zero bytes between the end of the last block and the beginning of the next block's magic + length + block. 1. If the blk file was already large enough to hold the new block, then the current existing file contents remain in the 8 byte gap between the end of the last block and the beginning of the next block's magic + length + block. Commonly, when this occcurs, it is due to the blk file containing blocks that are not connected to the block tree during reindex and are thus left behind by the reindex process and later overwritten when new blocks are added. The orphaned blocks can be valid blocks, but due to the nature of concurrent block download, the parent may not have been retrieved and written by the time the node was previously shutdown. ACKs for top commit: LarryRuane: tested code-review ACK bcb0cac ryanofsky: Code review ACK bcb0cac. This is a disturbing bug with an easy fix which seems well-worth merging. mzumsande: ACK bcb0cac (reviewed code and did some testing, I agree that it fixes the bug). w0xlt: tACK bitcoin@bcb0cac Tree-SHA512: acc97927ea712916506772550451136b0f1e5404e92df24cc05e405bb09eb6fe7c3011af3dd34a7723c3db17fda657ae85fa314387e43833791e9169c0febe51
Fixes #21379.
The blocks/blk?????.dat files are mutated and become increasingly malformed, or corrupt, as a result of running the re-indexing process.
The mutations occur after the re-indexing process has finished, as new blocks are appended, but are a result of a re-indexing process miscalculation that lingers in the block manager's
m_blockfile_info
nSize
data until node restart.These additions to the blk files are non-fatal, but also not desirable.
That is, this is a form of data corruption that the reading code is lenient enough to process (it skips the extra bytes), but it adds some scary looking log messages as it encounters them.
The summary of the problem is that the re-index process double counts the size of the serialization header (magic message start bytes [4 bytes] + length [4 bytes] = 8 bytes) while calculating the blk data file size (both values already account for the serialization header's size, hence why it is over accounted).
This bug manifests itself in a few different ways, after re-indexing, when a new block from a peer is processed:
FlatFileSeq::Flush
andTruncateFile
).fseek
when the file is opened for writing. Eight zero bytes are added to the end of the last blk file just before the new block is written. This happens because the write offset is 8 too great due to the miscalculation. The result is 8 zero bytes between the end of the last block and the beginning of the next block's magic + length + block.