-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Refactor block file logic #15118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor block file logic #15118
Conversation
src/init.cpp
Outdated
@@ -1631,8 +1631,8 @@ bool AppInitMain(InitInterfaces& interfaces) | |||
|
|||
// ********************************************************* Step 11: import blocks | |||
|
|||
if (!CheckDiskSpace() && !CheckDiskSpace(0, true)) | |||
return false; | |||
if (!CheckDiskSpace(GetDataDir()) && !CheckDiskSpace(GetBlocksDir())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not be ||
? /cc @jonasschnelli 386a6b6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, seems like it to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/flatfile.h
Outdated
* | ||
* @param[in] pos The first unwritten position in the file to be flushed. | ||
* @param[in] finalize True if no more data will be written to this file. | ||
* @return True on success, false on failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Start of sentence True
is fine IMO.
Concept ACK hooray objects! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
Code review up to ad59122.
src/validation.cpp
Outdated
@@ -3024,7 +3028,7 @@ static bool FindBlockPos(CDiskBlockPos &pos, unsigned int nAddSize, unsigned int | |||
} | |||
} | |||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit 2af8b71
Care add {}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the braces after the code is moved to FlatFileSeq::Allocate
.
src/flatfile.cpp
Outdated
#include <tinyformat.h> | ||
|
||
FlatFileSeq::FlatFileSeq(fs::path dir, const char* prefix, size_t chunk_size) | ||
: m_dir(std::move(dir)), m_prefix(prefix), m_chunk_size(chunk_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit 88cf780
nit, initialize one per line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this in the style guide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also don't see to declare class members in multiple lines, but fair enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on one per line as it makes diffs prettier when adding/removing stuff, but I'm fine as it is too, since it's only 3 vars.
src/flatfile.h
Outdated
#ifndef BITCOIN_FLATFILE_H | ||
#define BITCOIN_FLATFILE_H | ||
|
||
#include <chain.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit 88cf780
Prefer forward declaration of CDiskBlockPos
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Though CDiskBlockPos is moved into this header later on, so I don't think it's necessary to change in the earlier commit.
src/flatfile.cpp
Outdated
@@ -13,3 +14,26 @@ fs::path FlatFileSeq::FileName(const CDiskBlockPos& pos) const | |||
{ | |||
return m_dir / strprintf("%s%05u.dat", m_prefix, pos.nFile); | |||
} | |||
|
|||
FILE* FlatFileSeq::Open(const CDiskBlockPos& pos, bool fReadOnly) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit ad59122
Just asking for a follow up, could this be changed to std::fstream
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe... but a lot of file utilities like FileCommit
, AllocateFileRange
, etc would have to be modified as well. I think the better approach would be to return a CAutoFile
and define the move constructor/assignment operator properly. And probably move it to a different module, like in util.
@@ -310,6 +311,8 @@ static void FindFilesToPruneManual(std::set<int>& setFilesToPrune, int nManualPr | |||
static void FindFilesToPrune(std::set<int>& setFilesToPrune, uint64_t nPruneAfterHeight); | |||
bool CheckInputs(const CTransaction& tx, CValidationState &state, const CCoinsViewCache &inputs, bool fScriptChecks, unsigned int flags, bool cacheSigStore, bool cacheFullScriptStore, PrecomputedTransactionData& txdata, std::vector<CScriptCheck> *pvChecks = nullptr); | |||
static FILE* OpenUndoFile(const CDiskBlockPos &pos, bool fReadOnly = false); | |||
static FlatFileSeq BlockFileSeq(); | |||
static FlatFileSeq UndoFileSeq(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to just have these on-hand as global objects instead of having to reconstruct an object for each use through these functions? Obviously there isn't much of a performance concern given these calls happen before disk IO, but I'm just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are constructed with GetDataDir/GetBlocksDir
which are not static, so the initialization logic would get a bit tricky. Could be OK with lazy evaluation or something, but didn't seem worth it. Open to suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay. Like I said, no big deal AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All commits so far reviewed (up to 697d4c3fc). Great change aside from the bug I've noted above. This does a nice job of cleaning up and deduplicating flatfile data access code, and adds some test coverage. When taking into account the use for BIP157 filter storage (forthcoming), this is a very valuable patch.
"payments to be sent directly from one party to another without going " | ||
"through a financial institution."); | ||
std::string line2("Digital signatures provide part of the solution, but the main benefits are " | ||
"lost if a trusted third party is still required to prevent double-spending."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cute.
a5d21e5
to
fbbf962
Compare
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tACK 9b2d9ee, though the last two commits are in need of fixup.
I read over the code once more since the last rebase and feedback commits. To test, I ran a reindex using this branch on a datadir with a preexisting chain of height 522,000 that had been built without these changes. The reindex completed successfully. I started a recent version of master (12b3010) and saw that block verification succeeded and startup ran as normal.
I'll reACK after squash.
src/flatfile.h
Outdated
FlatFilePos(int nFileIn, unsigned int nPosIn) { | ||
nFile = nFileIn; | ||
nPos = nPosIn; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is kind of a move, but worth using an initializer list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK with some nits
@@ -0,0 +1,91 @@ | |||
// Copyright (c) 2009-2010 Satoshi Nakamoto | |||
// Copyright (c) 2009-2019 The Bitcoin Core developers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copyright looked off to me initially, but I guess if you move code, the copyright stays the same. (I was expecting "2019" only).
src/flatfile.cpp
Outdated
#include <tinyformat.h> | ||
|
||
FlatFileSeq::FlatFileSeq(fs::path dir, const char* prefix, size_t chunk_size) | ||
: m_dir(std::move(dir)), m_prefix(prefix), m_chunk_size(chunk_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on one per line as it makes diffs prettier when adding/removing stuff, but I'm fine as it is too, since it's only 3 vars.
src/flatfile.h
Outdated
* | ||
* @param[in] pos The first unwritten position in the file to be flushed. | ||
* @param[in] finalize True if no more data will be written to this file. | ||
* @return True on success, false on failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Start of sentence True
is fine IMO.
FILE* FlatFileSeq::Open(const FlatFilePos& pos, bool read_only) | ||
{ | ||
if (pos.IsNull()) | ||
return nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a move I believe, so feel free to ignore: add {}
or remove newline. (Also applies to code 4 lines down.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a move. Would rather not change.
src/flatfile.cpp
Outdated
return nullptr; | ||
} | ||
if (pos.nPos) { | ||
if (fseek(file, pos.nPos, SEEK_SET)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do single if
here.
if (pos.nPos && fseek(file, pos.nPos, SEEK_SET)) {
src/util/system.h
Outdated
@@ -71,6 +71,7 @@ void AllocateFileRange(FILE *file, unsigned int offset, unsigned int length); | |||
bool RenameOver(fs::path src, fs::path dest); | |||
bool LockDirectory(const fs::path& directory, const std::string lockfile_name, bool probe_only=false); | |||
bool DirIsWritable(const fs::path& directory); | |||
bool CheckDiskSpace(const fs::path& dir, uint64_t nAdditionalBytes = 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit/style: naming convention, additional_bytes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just moving code out of validation.{h,cpp}. Would rather not also rename vars.
|
||
BOOST_CHECK_EQUAL(seq.Allocate(FlatFilePos(0, 99), 2, out_of_space), 101); | ||
BOOST_CHECK_EQUAL(fs::file_size(seq.FileName(FlatFilePos(0, 99))), 200); | ||
BOOST_CHECK(!out_of_space); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would adding
uint64_t too_big = 1024 * 1024 + fs::space(data_dir).available;
BOOST_CHECK_EQUAL(seq.Allocate(FlatFilePos(0, 0), too_big, out_of_space), 0);
BOOST_CHECK_EQUAL(fs::file_size(seq.FileName(FlatFilePos(0, 0))), 200);
BOOST_CHECK(out_of_space);
be overkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't get this test passing on Travis... tried setting too_big
to a petabyte as well, and there was still some issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is that size_t
is 32 bit (edit: on certain systems) and there is more than 2^32 bytes available on the system, resulting in an overflow. The Allocate method should probably take a uint64_t
to avoid this.
src/util/system.cpp
Outdated
{ | ||
constexpr uint64_t nMinDiskSpace = 52428800; | ||
uint64_t nFreeBytesAvailable = fs::space(dir).available; | ||
return nFreeBytesAvailable >= nMinDiskSpace + nAdditionalBytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit/style: var_names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just moving code out of validation.{h,cpp}. Would rather not also rename vars.
return AbortNode("Disk space is low!", _("Error: Disk space is low!")); | ||
} | ||
if (bytes_allocated != 0 && fPruneMode) { | ||
fCheckForPruning = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can also do
fCheckForPruning |= bytes_allocated != 0 && fPruneMode;
and remove the if
stuff.
return state.Error("out of disk space"); | ||
} | ||
if (bytes_allocated != 0 && fPruneMode) { | ||
fCheckForPruning = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can get rid of if
case here as described above.
src/flatfile.cpp
Outdated
|
||
FlatFileSeq::FlatFileSeq(fs::path dir, const char* prefix, size_t chunk_size) | ||
: m_dir(std::move(dir)), m_prefix(prefix), m_chunk_size(chunk_size) | ||
{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this raise an error if chunk_size = 0
?
return file; | ||
} | ||
|
||
size_t FlatFileSeq::Allocate(const FlatFilePos& pos, size_t add_size, bool& out_of_space) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add_size = 0
be allowed? Seems like this would always cause n_new_chunks = n_old_chunks
thereby causing the function to return 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning 0 seems like the right thing to do there. Probably better than throwing an exception or something.
src/util/system.cpp
Outdated
@@ -135,6 +135,13 @@ bool DirIsWritable(const fs::path& directory) | |||
return true; | |||
} | |||
|
|||
bool CheckDiskSpace(const fs::path& dir, uint64_t nAdditionalBytes) | |||
{ | |||
constexpr uint64_t nMinDiskSpace = 52428800; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add an additional comment above this indicating its approximate size. Previously this comment was present:
// Check for nMinDiskSpace bytes (currently 50MB)
size_t new_size = n_new_chunks * m_chunk_size; | ||
size_t inc_size = new_size - old_size; | ||
|
||
if (CheckDiskSpace(m_dir, inc_size)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any concerns about the type mismatch here between inc_size
(size_t
) and the second argument to CheckDiskSpace
which is a uint64_t
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you get a compiler warning? If not, the implicit cast ought to be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No compiler warning, maybe just a note for the future that the sizes of the integer types should probably be reconciled at some point.
010d0cc
to
9131aef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 2e476e9
@@ -3016,21 +3016,13 @@ static bool FindBlockPos(CDiskBlockPos &pos, unsigned int nAddSize, unsigned int | |||
vinfoBlockFile[nFile].nSize += nAddSize; | |||
|
|||
if (!fKnown) { | |||
unsigned int nOldChunks = (pos.nPos + BLOCKFILE_CHUNK_SIZE - 1) / BLOCKFILE_CHUNK_SIZE; | |||
unsigned int nNewChunks = (vinfoBlockFile[nFile].nSize + BLOCKFILE_CHUNK_SIZE - 1) / BLOCKFILE_CHUNK_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "validation: Refactor block file pre-allocation into FlatFileSeq." (72d6932)
Note (just for reference because I found this function confusing): In the new code this line is replaced with:
unsigned int n_new_chunks = (pos.nPos + add_size + m_chunk_size - 1) / m_chunk_size;
which is equivalent because vinfoBlockFile[nFile].nSize
== pos.nPos + add_size
in the !fKnown
case due to:
pos.nPos = vinfoBlockFile[nFile].nSize;
...
vinfoBlockFile[nFile].nSize += nAddSize; 
above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the right move would be to add some comments in the new code, to reduce confusion to future readers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to add comments anywhere you think it would make the new code more clear, but I don't want to add comments because the old code was unnecessarily confusing and this is a slight change for anyone who happened to be familiar with it. I personally find the new Allocate function clear enough as is (including the n_new_chunks
calculation), but of course I wrote it...
A PR comment like the one above seems like a fine way to address this sort of thing IMO.
return AbortNode("Disk space is low!", _("Error: Disk space is low!")); | ||
} | ||
if (bytes_allocated != 0 && fPruneMode) { | ||
fCheckForPruning = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "validation: Refactor block file pre-allocation into FlatFileSeq." (72d6932)
Probably doesn't make any difference, but previously the fCheckForPruning = true
setting would happen before return AbortNode
, but now it will be skipped if there is no disk space.
Same comment applies in FindUndoPos
below.
fclose(file); | ||
return error("%s: failed to truncate file %d", __func__, pos.nFile); | ||
} | ||
if (!FileCommit(file)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "validation: Refactor file flush logic into FlatFileSeq." (5bc3ece)
Probably ok, but previous code would still call FileCommit
when TruncateFile
failed, and now it is skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 2e476e9, all comments are nits.
-BEGIN VERIFY SCRIPT- sed -i 's/CDiskBlockPos/FlatFilePos/g' $(git ls-files 'src/*.h' 'src/*.cpp') -END VERIFY SCRIPT-
utACK 04cca33 |
04cca33 Style cleanup. (Jim Posen) 4c01e4e flatfile: Unit tests for FlatFileSeq methods. (Jim Posen) 65a489e scripted-diff: Rename CBlockDiskPos to FlatFilePos. (Jim Posen) d6d8a78 Move CDiskBlockPos from chain to flatfile. (Jim Posen) e038093 validation: Refactor file flush logic into FlatFileSeq. (Jim Posen) 992404b validation: Refactor block file pre-allocation into FlatFileSeq. (Jim Posen) e2d2abb validation: Refactor OpenDiskFile into method on FlatFileSeq. (Jim Posen) 9183d6e validation: Extract basic block file logic into FlatFileSeq class. (Jim Posen) 62e7add util: Move CheckDiskSpace to util. (Jim Posen) Pull request description: This cleans up and refactors block file helpers so that they may be used by the block filter indexer. Per [design discussion](#14121 (comment)) about storing BIP 157 block filters, it has been suggested that they are stored in the same way as block and undo data. This refactor is sufficient to simplify file operations for this use case, though in the future perhaps more pruning-related logic ought to be moved into the new classes. The basic abstraction is a `FlatFileSeq` which manages access to a sequence of numbered files into which raw data is written. Tree-SHA512: b2108756777f2dad8964a1a2ef2764486e708a4a4a8cfac47b5de8bcb0625388964438eb096b10cfd9ea39212c299b5cb32fa943e768db2333cf49ea7def157e
…tem.h 3e48fc1 Renamed and moved util.h/cpp files to util/system.h & util/system.cpp (furszy) 7954cef Style cleanup. (Jim Posen) 48801a9 Use common SetDataDir method to create temp directory in tests. (winder) 06464d3 flatfile: Unit tests for FlatFileSeq methods. (Jim Posen) bbb9a90 scripted-diff: Rename CBlockDiskPos to FlatFilePos. (Jim Posen) 472857a Move CDiskBlockPos from chain to flatfile. (Jim Posen) 7fa47bb validation: Refactor file flush logic into FlatFileSeq. (Jim Posen) bf09076 validation: Refactor block file pre-allocation into FlatFileSeq. (Jim Posen) c813080 validation: Refactor OpenDiskFile into method on FlatFileSeq. (Jim Posen) 2619fe8 validation: Extract basic block file logic into FlatFileSeq class. (Jim Posen) e65acf0 util: Move CheckDiskSpace to util. (Jim Posen) Pull request description: This is part of the block logic refactoring and encapsulation work coming from upstream (work started in #2326 and #2328 due the possible blocks db corruption issue). Needed for two future works: (1) faster block validation using a new cache structure and (2) the future possible custom BIP157 (new sync protocol that supersedes BIP37 for light clients -- pending research needed after finishing v6.0 priorities --) Essentially, it's encapsulating the sequenced files open/read/write functionalities inside a new `flatfile` module, and extending the unit test coverage to validate its correct behavior. Adapted the following PRs: * bitcoin#15118. * bitcoin#13145. * Plus added `util.h/cpp` files mv to `util/system.h` & `util/system.cpp`. ACKs for top commit: random-zebra: utACK 3e48fc1 Tree-SHA512: c21ffb6ede054a279f9792c1cbe645b948078bd6bc607d32438f0601fd4df3650c0a34b349e46b4ea69b48f9e6c7bb18d258e139c6e1a47452ac9ea4f3bbee25
merge bitcoin#15118, bitcoin#14172, bitcoin#15623, bitcoin#14121, partial bitcoin#13743, partial bitcoin#15280: block filters
This cleans up and refactors block file helpers so that they may be used by the block filter indexer. Per design discussion about storing BIP 157 block filters, it has been suggested that they are stored in the same way as block and undo data. This refactor is sufficient to simplify file operations for this use case, though in the future perhaps more pruning-related logic ought to be moved into the new classes.
The basic abstraction is a
FlatFileSeq
which manages access to a sequence of numbered files into which raw data is written.