-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Ultraprune: use a pruned-txout-set database for block validation #1677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(EDITED) List of implementation changes:
Included in the pullreq, but technically separate:
|
@sipa One question, our current AppendBlockFile() function takes MAX_SIZE into account and generates a new block-file if the space left in the block file (max allowed filesize) is < MAX_SIZE. So 128 MiB files would have a maximum of 96 MiB usage-data, right? |
@Diapolo: not sure what you mean; I don't use AppendBlockFile anymore. |
@sipa I saw that and wanted to understand the change here, which condition is used to determine, if a new block-file needs to be created, where is the check in your new code for that and what's the space limit? |
The check is in FindBlockPos in main.cpp. And a new file is created if (old_used_size + new_block_size >= MAX_BLOCKFILE_SIZE). |
CDiskBlockPos blockPos; | ||
{ | ||
CChainDB chaindb; | ||
if (!FindBlockPos(chaindb, blockPos, nBlockSize+8, nHeight, nTime)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why nBlockSize+8, is that a padding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 bytes magic, 4 bytes block length; that's just the file format of blk*.dat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm lacking some background information here, sorry :). Is the format defined / described somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea, but I wanted to retain compatibility between pre and post-ultraprune block files, so I used the same format. That is: the files are a concatenation of {4 bytes magic, 4 bytes LE integer with the actual block size, block data itself).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this one and it explains what I was missing here: https://bitcointalk.org/index.php?topic=101514.0 thanks for your further explanation, too.
Why keep things compatible here, perhaps it's the right time to even optimize the internals of the block-files (e.g. compression or such a thing)?
Does this break the ability to downgrade at all? (I expect it just means wasted "padding" space in the blk*.dat files?) |
Updated. Batch block connection now keeps a permanent cache, and modifies that (instead of delaying block connection until several blocks were available, which interfered with normal network-based downloading). Also added a commit that changes the block database format, in preparation of things like parallel signature checking and initial headers-only mode. |
@sipa With block database format you mean stored blocks in blk0000x.dat? |
@luke-jr how do you mean breaking the ability to downgrade? The blk000*.dat files remain exactly the same format, but the other databases are incompatible. @Diapolo No, it uses coins.dat (the unspent txout set) and chain.dat (the block index), in addition to the blk_.dat (and rev_.dat) files. It's the format of chain.dat that changed in the last commit. |
@sipa If it interacts with downgrades in ugly ways, I'd probably not want to put it into next-test. |
@luke-jr Shouldn't be a problem - the filenames are all different, so you can (almost) run ultraprune and non-ultraprune together in the same datadir independently. That said, it's likely to conflict with a lot of other stuff, so decide for yourself. |
Could you provide a squashed version of the patch somewhere, for review? It's really hard to review as is because it's just a record of how you implemented it over time. |
Thanks, that looks useful. |
@mikehearn Seems that through rebasing I lost some comments you made earlier on the commits? Regarding the encodings, I plan to write some text about the final format for all datastructures, but I may change a few things still. |
Rebased/combined with @mikehearn's LevelDB patch |
Rebased on 0.7, and moved the more experimental block caching and parallel signature checking to a separate branch. The code in here should be stable and can be tested. The only things that remain to be done are automatic import of old data, and more elaborate consistency checks at startup. I think those can be done in separate pull requests though. This branch has its own LevelDB glue, independent (though similar, but simpler) from the one in Mike's leveldb branch. As the coin and block indexes are only opened once, there was no need for a CDB-like wrapper and global CDBEnv to cache database accessors. If LevelDB is merged first, I'll add reverts for most of it here. |
I closed the LevelDB pull req. Let's merge it as part of this. Note that my LevelDB branch has code that does replay the blocks with some GUI progress. It's not great because it actually re-writes the block files in order to track the block offsets ... I didn't do any deep refactorings to fix that as I wanted it to be as easy/fast to merge as possible and it's a one-off migration anyway. But as it's now a part of ultraprune that bridge was crossed, so you could just re-use whatever GUI code is possible. |
@TheBlueMatt any way to disable the build tester here, as it seems to be incompatible with this anyway? |
I've tested this a bit on the testnet. No problems found, and synchronization is super-fast. One small comment: in your bitcoin-qt.pro, please use $(MAKE) instead of |
@laanwj: updated to use $(MAKE) |
@sipa Id rather not, the patch is really quite simple (http://jenkins.bluematt.me/pull-tester/files/bitcoind-comparison.patch) , afaict, its only failing because setBlockIndexValid was added directly above hashGenesisBlock in main.cpp. Can you just move that line and see if it works? |
Changed the database/serialization format one more time: coins and undo data now contains the transaction version number. This may be necessary when new versions of transaction are defined that have an influence on their ability to be spent. @TheBlueMatt ok, moved the setBlockIndexValid line in main.cpp. |
This does not build on MacOS X because there is no fdatasync on that platform. |
@TheBlueMatt I wonder why it still complains? EDIT: Oh, just out of date with master. Let's wait for the next cycle. |
I just tried to start my client based on this branch and got: Loading block index... EXCEPTION: NSt8ios_base7failureE |
On investigation this failure can happen with both ultralevelprune and old bdb code, it happens when the block is not written but the db updates are. Typically if power is yanked at just the wrong time. As it is not a new failure mode, I guess it should not delay review/merge of this code. |
@@ -90,6 +90,33 @@ contains(BITCOIN_NEED_QT_PLUGINS, 1) { | |||
QTPLUGIN += qcncodecs qjpcodecs qtwcodecs qkrcodecs qtaccessiblewidgets | |||
} | |||
|
|||
contains(USE_LEVELDB, -) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this still includes legacy BDB support? Means we need to keep 2 code-bases up to date.
What was the intention to keep it to be able to revert, just wanna know :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, though the BDB version most likely doesn't compile anymore. This was converted from Mike's code which tried to keep compatibility, but that's just an unneccessary burden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, so it would be nice to remove that burden entirely from this pull and the code. If this is a one way ticket there is no need to keep BDB compatibility code in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original idea was to reduce the risk of merging the code, in case there were issues with LevelDB [on some specific platform] we don't want to hold up the release or do a potentially messy revert.
I agree it's irritating and a burden, but it'd suck if all of ultraprune ended up getting reverted due to unanticipated issues with LevelDB. Once 0.8 has been successfully rolled out to the userbase and things are quiet it could be deleted at that time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with removing that later as long as you / sipa keep track of that.
That whole block of commands in the pro-file looks like Vodoo to me anyway :-D.
Did anyone build this directly on Windows with MinGW? I saw there was a cross-compile Windows flag in the pro file. Perhaps I should just fetch that branch and try in the next days. |
Special serializer/deserializer for amount values. It is optimized for values which have few non-zero digits in decimal representation. Most amounts currently in the txout set take only 1 or 2 bytes to represent.
The CCoins class represents a pruned set of transaction outputs from a given transaction. It only retains information about its height in the block chain, whether it was a coinbase transaction, and its unspent outputs (script + amount). It has a custom serializer that has very low redundancy.
The CTxUndo class encapsulates data necessary to undo the effects of a transaction on the txout set, namely the previous outputs consumed by it (script + amount), and potentially transaction meta-data when it is spent entirely.
Refactor of the block storage code, which now stores one file per block. This will allow easier pruning, as blocks can be removed individually.
Create files (one per block) with undo information for the transactions in it.
Change the block storage layer again, this time with multiple files per block, but tracked by txindex.dat database entries. The file format is exactly the same as the earlier blk00001.dat, but with smaller files (128 MiB for now). The database entries track how many bytes each block file already uses, how many blocks are in it, which range of heights is present and which range of dates.
Introduce a AllocateFileRange() function in util, which wipes or at least allocates a given range of a file. It can be overriden by more efficient OS-dependent versions if necessary. Block and undo files are now allocated in chunks of 16 and 1 MiB, respectively.
This switches bitcoin's transaction/block verification logic to use a "coin database", which contains all unredeemed transaction output scripts, amounts and heights. The name ultraprune comes from the fact that instead of a full transaction index, we only (need to) keep an index with unspent outputs. For now, the blocks themselves are kept as usual, although they are only necessary for serving, rescanning and reorganizing. The basic datastructures are CCoins (representing the coins of a single transaction), and CCoinsView (representing a state of the coins database). There are several implementations for CCoinsView. A dummy, one backed by the coins database (coins.dat), one backed by the memory pool, and one that adds a cache on top of it. FetchInputs, ConnectInputs, ConnectBlock, DisconnectBlock, ... now operate on a generic CCoinsView. The block switching logic now builds a single cached CCoinsView with changes to be committed to the database before any changes are made. This means no uncommitted changes are ever read from the database, and should ease the transition to another database layer which does not support transactions (but does support atomic writes), like LevelDB. For the getrawtransaction() RPC call, access to a txid-to-disk index would be preferable. As this index is not necessary or even useful for any other part of the implementation, it is not provided. Instead, getrawtransaction() uses the coin database to find the block height, and then scans that block to find the requested transaction. This is slow, but should suffice for debug purposes.
During the initial block download (or -loadblock), delay connection of new blocks a bit, and perform them in a single action. This reduces the load on the database engine, as subsequent blocks often update an earlier block's transaction already.
Use CBlock's vMerkleTree to cache transaction hashes, and pass them along as argument in more function calls. During initial block download, this results in every transaction's hash to be only computed once.
To prevent excessive copying of CCoins in and out of the CCoinsView implementations, introduce a GetCoins() function in CCoinsViewCache with returns a direct reference. The block validation and connection logic is updated to require caching CCoinsViews, and exploits the GetCoins() function heavily.
Given that the block tree database (chain.dat) and the active chain database (coins.dat) are entirely separate now, it becomes legal to swap one with another instance without affecting the other. This commit introduces a check in the startup code that detects the presence of a better chain in chain.dat that has not been activated yet, and does so efficiently (in batch, while reusing the blk???.dat files).
This commit adds a status field and a transaction counter to the block indexes.
Split off CBlockTreeDB and CCoinsViewDB into txdb-*.{cpp,h} files, implemented by either LevelDB or BDB. Based on code from earlier commits by Mike Hearn in his leveldb branch.
Support LevelDB memory-backed environments, and use them in unit tests.
ACK. This appears ready for integration. |
Ultraprune: use a pruned-txout-set database for block validation
Ultraprune: use a pruned-txout-set database for block validation
CWalletTx::AddSupportingTransactions() was adding empty transaction to vtxPrev in some cases. Skip over these. Part one of the solution to bitcoin#3190. This prevents invalid vtxPrev from entering the wallet, but not current ones being transmitted.
This is a rewrite of the block storage and validation engine.
Instead of blkindex.dat (a database with block tree data, and all transactions and their spendings in the active chain), it uses chain.dat (only block tree data) and coins.dat (pruned txout set). These two databases together are significantly smaller than blkindex.dat (<200 MiB), and only coins.dat is actively needed during block validation, speeding it up significantly (15 minutes for importing 185000 blocks from a local disk file).
Blocks are still stored in blk????.dat files, in the same file format, but smaller files (up to 128 MiB). To prevent excessive fragmentation, they are allocated in chunks of 16 MiB, and some statistics are kept about them. To assist with reorganisation, undo files are created (rev????.dat), which contain the data necessary to undo block connections.
Block pruning itself is not yet implemented, but this makes it trivial to do so; all that is required is deleting old block and undo files when certain thresholds are reached. Also note that this block pruning mechanism is different from the transaction pruning mechanism described by Satoshi. This one does not prevent a node from acting as a full node.
All commits result in a functional code tree, with succeeding unit tests. The first few add some extra classes, without changing actual semantics. "One file per block" and "Multiple blocks per file" form a refactor of the block storage mechanism, with related database changes. "Do not store hashNext on disk" only introduces a forward-incompatible change that simplifies the database layout. "Ultraprune" itself contains the switch from txindex.dat to coins.dat as validation data, and contains the majority of the changes. What follows are optimizations and other some improvements, that do not effect compatibility.
There are a few TODO's left (see comment below), but I'd like to give the code some exposure already.