-
Notifications
You must be signed in to change notification settings - Fork 37.7k
allow -loadblock blocks to be unsorted #20331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
9fbcd0b
to
37408b8
Compare
A useful follow-on to this PR, since loading unordered blocks would become possible, might be the addition of a Another way to accomplish approximately the same thing would be to replace the entire |
Appears that the CI picked up a couple of argument issues with the
|
I don't know if anyone has been confused by |
37408b8
to
7afedcf
Compare
Thanks, @adamjonas, force-pushed to fix the fuzz test. |
src/validation.cpp
Outdated
FILE* file = fsbridge::fopen(path, "rb"); | ||
if (!file) { | ||
return error("ReadBlockFromDisk: open failed for %s", path.string()); | ||
} | ||
if (offset > 0 && fseek(file, offset, SEEK_SET)) { | ||
fclose(file); | ||
return error("ReadBlockFromDisk: fseek failed for %s offset %u", path.string(), offset); | ||
} | ||
CAutoFile filein(file, SER_DISK, CLIENT_VERSION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change made? Could you explain how the behavior of this is changing, aside from the function signature taking const fs::path& path, unsigned int offset
instead of const FlatFilePos& pos
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I'm guessing this change was made bc of what you do at line 4810: ReadBlockFromDisk(*pblockrecursive, blk_paths[child_pos.nFile], child_pos.nPos, chainparams.GetConsensus())
But I'm failing to understand why you can't just do: ReadBlockFromDisk(*pblockrecursive, child_pos, chainparams.GetConsensus())
I'm probably missing something here, but I just want to check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I reached several dead-ends before arriving at this code factoring. We can't use ReadBlockFromDisk(..., child_pos, ...)
here because the pos
argument is used in lower-level code (OpenFlockFile()
) to construct and open a pathname like .../.bitcoin/blocks/blknnnnn.dat
, which is just what we want for the -reindex
case, but not for the -loadblock
case (both share this code), because in the latter, we have a list of arbitrary pathnames to read from. Instead of calling the pos
variant in the -reindex
case and the path, offset
variant in the -loadblock
case, it's simpler to just call it in the same way for both cases.
src/init.cpp
Outdated
} else { | ||
LogPrintf("Warning: Could not open blocks file %s\n", path.string()); | ||
} | ||
if (vImportFiles.size() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need the additional code branch here? Would it be preferable to just invoke LoadExternalBlockFiles("Importing", chainparams, vImportFiles, true);
no matter what, and the for loop just won't iterate through anything if vImportFiles
is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it's not really needed, but I think it's useful to show the reader here that loading blocks this isn't always being done; otherwise, to know nothing's being done, the reader has to go look at LoadExternalBlockFiles()
.
I also like that the conditional makes this part of the code similar to the if (fReindex)
code just above it; they are doing very similar work.
But I could go either way (no strong preference).
src/validation.cpp
Outdated
FILE* file = fsbridge::fopen(blk_paths[n_file], "rb"); | ||
if (file == nullptr) { | ||
LogPrintf("%s: Warning: Could not open blocks file %s\n", __func__, blk_paths[n_file]); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this kinda duplication of what ReadBlockFromDisk
does? I'm wondering why FlatFilePos pos(n_file, 0);
isn't created, passed to ReadBlockFromDisk
(like before, when signature took FlatFilePos
), and then you can pass it to LoadExternalBlockFile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplication exists even without this PR. Before this PR, this fopen()
happens here.
To summarize why this happens in two places: There are two ways blocks are "loaded" into memory when reindexing (and, now, with this PR, when using -loadblock
). First, a blocks file is opened (here) and read from disk (using large reads) sequentially. At this point, we don't know the individual blocks' starting offsets, so we can't create a FlatFilePos
to pass to ReadBlockFromDisk()
as you suggest. Besides, doing these large reads is more efficient, typically many blocks per read -- plus a partial block at the end (since, again, the block boundaries are unknown). Once in memory, the block boundaries are discovered, and each block is deserialized. But the blocks in these blocks files are mostly out of order, so most of the blocks can't be accepted (have AcceptBlock()
called on it) because its parent has not yet been seen.
When such a parent is later found, the child block is then re-read from disk using ReadBlockFromDisk()
, deserialized a second time, and then passed to AcceptBlock()
(allowed because its parent is now present). In this case, we're reading exactly a single block, because we know where it starts (in the file) and its length (which is what a FlatFilePos
encodes).
src/validation.h
Outdated
const std::vector<fs::path>& blk_paths, | ||
size_t n_file, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the name LoadExternalBlockFile
I'm confused why blk_paths
plural is passed as a param, instead of just a single file/path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That function, LoadExternalBlockFile()
, does (attempt to) load the blocks from exactly one block file, but, doing so may cause, as a side effect, the loading of blocks seen earlier (in earlier files) but that could not be loaded then because their parent was missing (and is now present, see my earlier comment). This requires reaching back and reading specific blocks from earlier files, and that requires the full list of block file paths (the blk_paths
argument). This basic flow is the same as before this PR. I agree this is somewhat confusing, maybe a comment would help. But maybe not, because perhaps there's no shortcut to just studying the code, I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mjdietzx, thank you for your review!
src/init.cpp
Outdated
} else { | ||
LogPrintf("Warning: Could not open blocks file %s\n", path.string()); | ||
} | ||
if (vImportFiles.size() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it's not really needed, but I think it's useful to show the reader here that loading blocks this isn't always being done; otherwise, to know nothing's being done, the reader has to go look at LoadExternalBlockFiles()
.
I also like that the conditional makes this part of the code similar to the if (fReindex)
code just above it; they are doing very similar work.
But I could go either way (no strong preference).
src/validation.cpp
Outdated
FILE* file = fsbridge::fopen(path, "rb"); | ||
if (!file) { | ||
return error("ReadBlockFromDisk: open failed for %s", path.string()); | ||
} | ||
if (offset > 0 && fseek(file, offset, SEEK_SET)) { | ||
fclose(file); | ||
return error("ReadBlockFromDisk: fseek failed for %s offset %u", path.string(), offset); | ||
} | ||
CAutoFile filein(file, SER_DISK, CLIENT_VERSION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I reached several dead-ends before arriving at this code factoring. We can't use ReadBlockFromDisk(..., child_pos, ...)
here because the pos
argument is used in lower-level code (OpenFlockFile()
) to construct and open a pathname like .../.bitcoin/blocks/blknnnnn.dat
, which is just what we want for the -reindex
case, but not for the -loadblock
case (both share this code), because in the latter, we have a list of arbitrary pathnames to read from. Instead of calling the pos
variant in the -reindex
case and the path, offset
variant in the -loadblock
case, it's simpler to just call it in the same way for both cases.
src/validation.cpp
Outdated
FILE* file = fsbridge::fopen(blk_paths[n_file], "rb"); | ||
if (file == nullptr) { | ||
LogPrintf("%s: Warning: Could not open blocks file %s\n", __func__, blk_paths[n_file]); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplication exists even without this PR. Before this PR, this fopen()
happens here.
To summarize why this happens in two places: There are two ways blocks are "loaded" into memory when reindexing (and, now, with this PR, when using -loadblock
). First, a blocks file is opened (here) and read from disk (using large reads) sequentially. At this point, we don't know the individual blocks' starting offsets, so we can't create a FlatFilePos
to pass to ReadBlockFromDisk()
as you suggest. Besides, doing these large reads is more efficient, typically many blocks per read -- plus a partial block at the end (since, again, the block boundaries are unknown). Once in memory, the block boundaries are discovered, and each block is deserialized. But the blocks in these blocks files are mostly out of order, so most of the blocks can't be accepted (have AcceptBlock()
called on it) because its parent has not yet been seen.
When such a parent is later found, the child block is then re-read from disk using ReadBlockFromDisk()
, deserialized a second time, and then passed to AcceptBlock()
(allowed because its parent is now present). In this case, we're reading exactly a single block, because we know where it starts (in the file) and its length (which is what a FlatFilePos
encodes).
src/validation.h
Outdated
const std::vector<fs::path>& blk_paths, | ||
size_t n_file, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That function, LoadExternalBlockFile()
, does (attempt to) load the blocks from exactly one block file, but, doing so may cause, as a side effect, the loading of blocks seen earlier (in earlier files) but that could not be loaded then because their parent was missing (and is now present, see my earlier comment). This requires reaching back and reading specific blocks from earlier files, and that requires the full list of block file paths (the blk_paths
argument). This basic flow is the same as before this PR. I agree this is somewhat confusing, maybe a comment would help. But maybe not, because perhaps there's no shortcut to just studying the code, I'm not sure.
src/validation.cpp
Outdated
@@ -49,6 +49,7 @@ | |||
#include <validationinterface.h> | |||
#include <warnings.h> | |||
|
|||
#include <cassert> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't needed, I'll remove in a later force-push.
@@ -11,6 +11,7 @@ | |||
#include <validation.h> | |||
|
|||
#include <cstdint> | |||
#include <map> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't needed, I'll remove in a later force-push.
Here's a manual test procedure (I'll describe testnet because it's faster, but mainnet works too)
This generates a command line like:
|
@LarryRuane I guess it's possible to make a functional test then? |
Verified that the test fails without the new behaviour, cool! A couple of things that could be done:
|
7afedcf
to
b99ecea
Compare
b99ecea
to
1a3f38e
Compare
Rebased, no functional changes. |
1a3f38e
to
2cfd1b4
Compare
This required rebase also splits the second commit into two commits to separate refactoring from the functional changes. Ready for review. |
78ae057
to
c0c9bec
Compare
c0c9bec
to
dbd7f72
Compare
Rebased. The second commit is now much simpler because of #21727. |
utACK dbd7f72 |
dbd7f72
to
a62fc29
Compare
Rebased to resolve merge conflicts. |
This lower-level version takes a path and a seek offset as arguments. Currently unused but is needed by the next commit.
Add LoadExternalBlockFiles() (plural) that accepts a list of paths and loads all the blocks found in these paths. The blocks can be completely out of order; block C (child) can be seen first and its parent P not seen until a later file, then block C will be processed. This requires re-reading block C from the earlier file; block C isn't kept in memory during this interval. The list of paths is either the blocks/blk*.dat files (during reindexing) or given with one or more -loadblk arguments. LoadExternalBlockFiles() calls LoadExternalBlockFile() (singular) to read a single file, but out-of-order blocks requires that this function be able to read individual blocks from earlier files.
a62fc29
to
0c97fa3
Compare
🐙 This pull request conflicts with the target branch and needs rebase. Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft". |
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
The |
This PR was suggested in this comment in #19594, and builds on that PR (the first commit here).
This PR allows the blocks in files specified by the
-loadblock=file
command line (configuration) options (there can be multiple) to be unsorted. This is already allowed by-reindex
, but currently when-loadblock
encounters a block with an unknown parent, it is ignored. This means the blocks must be sorted by height to be loaded successfully. This PR fixes this restriction, allowing the blocks across all the specified blocks files to be in any order. This makes the-loadblock
option more useful. Also, the affected code is cleaner and more maintainable.(I'll simplify this description move details to a separate comment before this PR gets merged.)