Ultraprune: use a pruned-txout-set database for block validation #1677

sipa · 2012-08-16T18:26:50Z

This is a rewrite of the block storage and validation engine.

Instead of blkindex.dat (a database with block tree data, and all transactions and their spendings in the active chain), it uses chain.dat (only block tree data) and coins.dat (pruned txout set). These two databases together are significantly smaller than blkindex.dat (<200 MiB), and only coins.dat is actively needed during block validation, speeding it up significantly (15 minutes for importing 185000 blocks from a local disk file).

Blocks are still stored in blk????.dat files, in the same file format, but smaller files (up to 128 MiB). To prevent excessive fragmentation, they are allocated in chunks of 16 MiB, and some statistics are kept about them. To assist with reorganisation, undo files are created (rev????.dat), which contain the data necessary to undo block connections.

Block pruning itself is not yet implemented, but this makes it trivial to do so; all that is required is deleting old block and undo files when certain thresholds are reached. Also note that this block pruning mechanism is different from the transaction pruning mechanism described by Satoshi. This one does not prevent a node from acting as a full node.

All commits result in a functional code tree, with succeeding unit tests. The first few add some extra classes, without changing actual semantics. "One file per block" and "Multiple blocks per file" form a refactor of the block storage mechanism, with related database changes. "Do not store hashNext on disk" only introduces a forward-incompatible change that simplifies the database layout. "Ultraprune" itself contains the switch from txindex.dat to coins.dat as validation data, and contains the majority of the changes. What follows are optimizations and other some improvements, that do not effect compatibility.

There are a few TODO's left (see comment below), but I'd like to give the code some exposure already.

sipa · 2012-08-16T18:29:39Z

(EDITED)

List of implementation changes:

new database layout:
- 2 leveldb's (coins/ and blktree/ subdirs), replacing blkindex.dat
- separate directory (blocks/) with block data (in the usual format, but smaller files) and undo data
database keys are of the form (char,key) instead of (string,key) for reasons of compactness
there is no txid-to-diskpos index anymore, only blkid-to-diskpos and txid-to-unspent-outputs
- this makes getrawtransaction only work on unspent outputs (and slower)
  - an optional txid-to-diskpos index is planned
some new very specialized serializers are added (compact variable-length integer, compact amount, compact txout)
block index does not store hashBlockNext anymore - this is reconstructed from hashBestBlock at startup
at startup, automatically reorg to the best block in blktree/blocks
new RPCs: gettxoutsetinfo and gettxout operate on the coins database
no more CTxIndex in-scope: instead, a global pcoinsTip (representing the coin db) and pblocktree (representing the blktree db)
- intended to be moved to separate modules/classes, properly encapsulated
blktree database contains statistics about the block file (size, which blocks in it, times, heights, undo stats, ...)
blktree database contains flag per block that determines the degree of validation it had, to allow future headers-first mode
block files are pre-allocated (in batches of 16 MiB, the files grow to max 128 MIB), to reduce fragmentation
transaction hashes are cached, and typically never calculated more than once

Included in the pullreq, but technically separate:

do -loadblock= and bootstrap.dat import in a separate thread
add check for strict DER encoding for signatures, and standard public keys

Diapolo · 2012-08-21T12:18:08Z

@sipa One question, our current AppendBlockFile() function takes MAX_SIZE into account and generates a new block-file if the space left in the block file (max allowed filesize) is < MAX_SIZE. So 128 MiB files would have a maximum of 96 MiB usage-data, right?

sipa · 2012-08-21T12:24:53Z

@Diapolo: not sure what you mean; I don't use AppendBlockFile anymore.

Diapolo · 2012-08-21T12:27:01Z

@sipa I saw that and wanted to understand the change here, which condition is used to determine, if a new block-file needs to be created, where is the check in your new code for that and what's the space limit?

sipa · 2012-08-21T12:29:39Z

The check is in FindBlockPos in main.cpp. And a new file is created if (old_used_size + new_block_size >= MAX_BLOCKFILE_SIZE).

Diapolo · 2012-08-21T13:13:27Z

src/main.cpp

+    CDiskBlockPos blockPos;
+    {
+        CChainDB chaindb;
+        if (!FindBlockPos(chaindb, blockPos, nBlockSize+8, nHeight, nTime))


Why nBlockSize+8, is that a padding?

4 bytes magic, 4 bytes block length; that's just the file format of blk*.dat.

I'm lacking some background information here, sorry :). Is the format defined / described somewhere?

No idea, but I wanted to retain compatibility between pre and post-ultraprune block files, so I used the same format. That is: the files are a concatenation of {4 bytes magic, 4 bytes LE integer with the actual block size, block data itself).

I found this one and it explains what I was missing here: https://bitcointalk.org/index.php?topic=101514.0 thanks for your further explanation, too.

Why keep things compatible here, perhaps it's the right time to even optimize the internals of the block-files (e.g. compression or such a thing)?

luke-jr · 2012-08-24T04:03:43Z

Does this break the ability to downgrade at all? (I expect it just means wasted "padding" space in the blk*.dat files?)

sipa · 2012-08-27T00:23:47Z

Updated. Batch block connection now keeps a permanent cache, and modifies that (instead of delaying block connection until several blocks were available, which interfered with normal network-based downloading). Also added a commit that changes the block database format, in preparation of things like parallel signature checking and initial headers-only mode.

Diapolo · 2012-08-27T05:47:16Z

@sipa With block database format you mean stored blocks in blk0000x.dat?

sipa · 2012-08-27T10:31:17Z

@luke-jr how do you mean breaking the ability to downgrade? The blk000*.dat files remain exactly the same format, but the other databases are incompatible.

@Diapolo No, it uses coins.dat (the unspent txout set) and chain.dat (the block index), in addition to the blk_.dat (and rev_.dat) files. It's the format of chain.dat that changed in the last commit.

luke-jr · 2012-08-27T16:47:40Z

@sipa If it interacts with downgrades in ugly ways, I'd probably not want to put it into next-test.

sipa · 2012-08-27T17:07:32Z

@luke-jr Shouldn't be a problem - the filenames are all different, so you can (almost) run ultraprune and non-ultraprune together in the same datadir independently.

That said, it's likely to conflict with a lot of other stuff, so decide for yourself.

mikehearn · 2012-08-30T12:34:08Z

Could you provide a squashed version of the patch somewhere, for review? It's really hard to review as is because it's just a record of how you implemented it over time.

sipa · 2012-08-30T12:48:50Z

@mikehearn https://github.com/bitcoin/bitcoin/pull/1677.diff ?

mikehearn · 2012-08-31T10:32:19Z

Thanks, that looks useful.

sipa · 2012-08-31T11:29:53Z

@mikehearn Seems that through rebasing I lost some comments you made earlier on the commits?

Regarding the encodings, I plan to write some text about the final format for all datastructures, but I may change a few things still.

sipa · 2012-09-04T23:43:10Z

Rebased/combined with @mikehearn's LevelDB patch

sipa · 2012-09-20T14:40:02Z

Rebased on 0.7, and moved the more experimental block caching and parallel signature checking to a separate branch. The code in here should be stable and can be tested.

The only things that remain to be done are automatic import of old data, and more elaborate consistency checks at startup. I think those can be done in separate pull requests though.

This branch has its own LevelDB glue, independent (though similar, but simpler) from the one in Mike's leveldb branch. As the coin and block indexes are only opened once, there was no need for a CDB-like wrapper and global CDBEnv to cache database accessors. If LevelDB is merged first, I'll add reverts for most of it here.

mikehearn · 2012-09-20T16:05:38Z

I closed the LevelDB pull req. Let's merge it as part of this.

Note that my LevelDB branch has code that does replay the blocks with some GUI progress. It's not great because it actually re-writes the block files in order to track the block offsets ... I didn't do any deep refactorings to fix that as I wanted it to be as easy/fast to merge as possible and it's a one-off migration anyway. But as it's now a part of ultraprune that bridge was crossed, so you could just re-use whatever GUI code is possible.

sipa · 2012-09-21T12:07:02Z

@TheBlueMatt any way to disable the build tester here, as it seems to be incompatible with this anyway?

laanwj · 2012-09-21T13:01:10Z

I've tested this a bit on the testnet. No problems found, and synchronization is super-fast.

One small comment: in your bitcoin-qt.pro, please use $(MAKE) instead of make. This prevents an annoying warning about a job server in Qt Creator.

sipa · 2012-09-21T13:12:49Z

@laanwj: updated to use $(MAKE)

TheBlueMatt · 2012-09-22T20:46:20Z

@sipa Id rather not, the patch is really quite simple (http://jenkins.bluematt.me/pull-tester/files/bitcoind-comparison.patch) , afaict, its only failing because setBlockIndexValid was added directly above hashGenesisBlock in main.cpp. Can you just move that line and see if it works?

sipa · 2012-09-25T11:48:19Z

Changed the database/serialization format one more time: coins and undo data now contains the transaction version number. This may be necessary when new versions of transaction are defined that have an influence on their ability to be spent.

@TheBlueMatt ok, moved the setBlockIndexValid line in main.cpp.

mikehearn · 2012-09-27T15:00:43Z

This does not build on MacOS X because there is no fdatasync on that platform.

sipa · 2012-09-28T14:56:28Z

@TheBlueMatt I wonder why it still complains?

EDIT: Oh, just out of date with master. Let's wait for the next cycle.

mikehearn · 2012-09-29T10:19:04Z

I just tried to start my client based on this branch and got:

Loading block index...
Opening LevelDB in /Users/hearn/Library/Application Support/Bitcoin/blktree
Opened LevelDB successfully
Opening LevelDB in /Users/hearn/Library/Application Support/Bitcoin/coins
Opened LevelDB successfully
LoadBlockIndex(): last block file = 23
LoadBlockIndex(): last block file: CBlockFileInfo(blocks=1572, size=132444896, heights=199237..200807, time=2012-09-17..2012-09-27)
LoadBlockIndex(): hashBestChain=00000000000000e78688 height=200806 date=09/27/2012 21:08:42
Verifying last 2500 blocks at level 1
block index 36135ms
Loading wallet...
dbenv.open LogDir=/Users/hearn/Library/Application Support/Bitcoin/database ErrorFile=/Users/hearn/Library/Application Support/Bitcoin/db.log
nFileVersion = 70099
wallet 1192ms
REORGANIZE: Disconnect 1 blocks; 000000000000051dcdc2..00000000000000e78688
REORGANIZE: Connect 2 blocks; 000000000000051dcdc2..00000000000003d0a2b1

EXCEPTION: NSt8ios_base7failureE
CAutoFile::read : end of file
bitcoin in Runaway exception

mikehearn · 2012-09-29T11:19:56Z

On investigation this failure can happen with both ultralevelprune and old bdb code, it happens when the block is not written but the db updates are. Typically if power is yanked at just the wrong time.

As it is not a new failure mode, I guess it should not delay review/merge of this code.

Diapolo · 2012-10-11T20:34:11Z

bitcoin-qt.pro

@@ -90,6 +90,33 @@ contains(BITCOIN_NEED_QT_PLUGINS, 1) {
    QTPLUGIN += qcncodecs qjpcodecs qtwcodecs qkrcodecs qtaccessiblewidgets
 }

+contains(USE_LEVELDB, -) {


So this still includes legacy BDB support? Means we need to keep 2 code-bases up to date.
What was the intention to keep it to be able to revert, just wanna know :).

Yes, though the BDB version most likely doesn't compile anymore. This was converted from Mike's code which tried to keep compatibility, but that's just an unneccessary burden.

Thanks, so it would be nice to remove that burden entirely from this pull and the code. If this is a one way ticket there is no need to keep BDB compatibility code in.

The original idea was to reduce the risk of merging the code, in case there were issues with LevelDB [on some specific platform] we don't want to hold up the release or do a potentially messy revert.

I agree it's irritating and a burden, but it'd suck if all of ultraprune ended up getting reverted due to unanticipated issues with LevelDB. Once 0.8 has been successfully rolled out to the userbase and things are quiet it could be deleted at that time?

I'm fine with removing that later as long as you / sipa keep track of that.
That whole block of commands in the pro-file looks like Vodoo to me anyway :-D.

Diapolo · 2012-10-11T20:37:03Z

Did anyone build this directly on Windows with MinGW? I saw there was a cross-compile Windows flag in the pro file. Perhaps I should just fetch that branch and try in the next days.

Special serializer/deserializer for amount values. It is optimized for values which have few non-zero digits in decimal representation. Most amounts currently in the txout set take only 1 or 2 bytes to represent.

The CCoins class represents a pruned set of transaction outputs from a given transaction. It only retains information about its height in the block chain, whether it was a coinbase transaction, and its unspent outputs (script + amount). It has a custom serializer that has very low redundancy.

The CTxUndo class encapsulates data necessary to undo the effects of a transaction on the txout set, namely the previous outputs consumed by it (script + amount), and potentially transaction meta-data when it is spent entirely.

Refactor of the block storage code, which now stores one file per block. This will allow easier pruning, as blocks can be removed individually.

Create files (one per block) with undo information for the transactions in it.

Change the block storage layer again, this time with multiple files per block, but tracked by txindex.dat database entries. The file format is exactly the same as the earlier blk00001.dat, but with smaller files (128 MiB for now). The database entries track how many bytes each block file already uses, how many blocks are in it, which range of heights is present and which range of dates.

Introduce a AllocateFileRange() function in util, which wipes or at least allocates a given range of a file. It can be overriden by more efficient OS-dependent versions if necessary. Block and undo files are now allocated in chunks of 16 and 1 MiB, respectively.

This switches bitcoin's transaction/block verification logic to use a "coin database", which contains all unredeemed transaction output scripts, amounts and heights. The name ultraprune comes from the fact that instead of a full transaction index, we only (need to) keep an index with unspent outputs. For now, the blocks themselves are kept as usual, although they are only necessary for serving, rescanning and reorganizing. The basic datastructures are CCoins (representing the coins of a single transaction), and CCoinsView (representing a state of the coins database). There are several implementations for CCoinsView. A dummy, one backed by the coins database (coins.dat), one backed by the memory pool, and one that adds a cache on top of it. FetchInputs, ConnectInputs, ConnectBlock, DisconnectBlock, ... now operate on a generic CCoinsView. The block switching logic now builds a single cached CCoinsView with changes to be committed to the database before any changes are made. This means no uncommitted changes are ever read from the database, and should ease the transition to another database layer which does not support transactions (but does support atomic writes), like LevelDB. For the getrawtransaction() RPC call, access to a txid-to-disk index would be preferable. As this index is not necessary or even useful for any other part of the implementation, it is not provided. Instead, getrawtransaction() uses the coin database to find the block height, and then scans that block to find the requested transaction. This is slow, but should suffice for debug purposes.

During the initial block download (or -loadblock), delay connection of new blocks a bit, and perform them in a single action. This reduces the load on the database engine, as subsequent blocks often update an earlier block's transaction already.

Use CBlock's vMerkleTree to cache transaction hashes, and pass them along as argument in more function calls. During initial block download, this results in every transaction's hash to be only computed once.

To prevent excessive copying of CCoins in and out of the CCoinsView implementations, introduce a GetCoins() function in CCoinsViewCache with returns a direct reference. The block validation and connection logic is updated to require caching CCoinsViews, and exploits the GetCoins() function heavily.

Given that the block tree database (chain.dat) and the active chain database (coins.dat) are entirely separate now, it becomes legal to swap one with another instance without affecting the other. This commit introduces a check in the startup code that detects the presence of a better chain in chain.dat that has not been activated yet, and does so efficiently (in batch, while reusing the blk???.dat files).

This commit adds a status field and a transaction counter to the block indexes.

Split off CBlockTreeDB and CCoinsViewDB into txdb-*.{cpp,h} files, implemented by either LevelDB or BDB. Based on code from earlier commits by Mike Hearn in his leveldb branch.

Support LevelDB memory-backed environments, and use them in unit tests.

gmaxwell · 2012-10-20T21:41:56Z

ACK. This appears ready for integration.

Ultraprune: use a pruned-txout-set database for block validation

CWalletTx::AddSupportingTransactions() was adding empty transaction to vtxPrev in some cases. Skip over these. Part one of the solution to bitcoin#3190. This prevents invalid vtxPrev from entering the wallet, but not current ones being transmitted.

Diapolo reviewed Aug 21, 2012
View reviewed changes

Diapolo reviewed Oct 11, 2012
View reviewed changes

sipa added 19 commits October 20, 2012 23:08

Compact serialization for amounts

0fa593d

Special serializer/deserializer for amount values. It is optimized for values which have few non-zero digits in decimal representation. Most amounts currently in the txout set take only 1 or 2 bytes to represent.

Add CTxUndo: transaction undo information

44ac1c0

The CTxUndo class encapsulates data necessary to undo the effects of a transaction on the txout set, namely the previous outputs consumed by it (script + amount), and potentially transaction meta-data when it is spent entirely.

One file per block

630fd8d

Refactor of the block storage code, which now stores one file per block. This will allow easier pruning, as blocks can be removed individually.

Preliminary undo file creation

8adf48d

Create files (one per block) with undo information for the transactions in it.

Batch block connection during IBD

ae8bfd1

During the initial block download (or -loadblock), delay connection of new blocks a bit, and perform them in a single action. This reduces the load on the database engine, as subsequent blocks often update an earlier block's transaction already.

Transaction hash caching

64dd46f

Use CBlock's vMerkleTree to cache transaction hashes, and pass them along as argument in more function calls. During initial block download, this results in every transaction's hash to be only computed once.

Prepare database format for multi-stage block processing

857c61d

This commit adds a status field and a transaction counter to the block indexes.

Use singleton block tree database instance

d979e6e

Flush and sync block data

44d40f2

LevelDB block and coin databases

2d8a482

Split off CBlockTreeDB and CCoinsViewDB into txdb-*.{cpp,h} files, implemented by either LevelDB or BDB. Based on code from earlier commits by Mike Hearn in his leveldb branch.

Add LevelDB MemEnv support

e1bfbab

Support LevelDB memory-backed environments, and use them in unit tests.

Add gettxout and gettxoutsetinfo RPCs

beeb576

Remove BDB block database support

4ca60bb

sipa added a commit that referenced this pull request Oct 20, 2012

Merge pull request #1677 from sipa/ultraprune

cf9b49f

Ultraprune: use a pruned-txout-set database for block validation

sipa merged commit cf9b49f into bitcoin:master Oct 20, 2012

laudney pushed a commit to reddcoin-project/reddcoin-3.10 that referenced this pull request Mar 19, 2014

Merge pull request bitcoin#1677 from sipa/ultraprune

f7eb23f

Ultraprune: use a pruned-txout-set database for block validation

This was referenced Sep 30, 2014

Intentional bug #1 to test #5006 #5011

Closed

Intentional bug #2 to test #5006 #5012

Closed

laanwj mentioned this pull request Aug 11, 2016

In a wallet in pruned mode, show balance of imported (watch only and active) addresses. #8497

Closed

meshcollider mentioned this pull request Dec 3, 2018

doc: Declare BLOCK_VALID_HEADER reserved #14862

Merged

jnewbery mentioned this pull request Aug 11, 2021

Lifecycle of a transaction doc glozow/bitcoin-notes#1

Merged

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

Ultraprune: use a pruned-txout-set database for block validation #1677

Ultraprune: use a pruned-txout-set database for block validation #1677

Uh oh!

Conversation

sipa commented Aug 16, 2012

Uh oh!

sipa commented Aug 16, 2012

Uh oh!

Diapolo commented Aug 21, 2012

Uh oh!

sipa commented Aug 21, 2012

Uh oh!

Diapolo commented Aug 21, 2012

Uh oh!

sipa commented Aug 21, 2012

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luke-jr commented Aug 24, 2012

Uh oh!

sipa commented Aug 27, 2012

Uh oh!

Diapolo commented Aug 27, 2012

Uh oh!

sipa commented Aug 27, 2012

Uh oh!

luke-jr commented Aug 27, 2012

Uh oh!

sipa commented Aug 27, 2012

Uh oh!

mikehearn commented Aug 30, 2012

Uh oh!

sipa commented Aug 30, 2012

Uh oh!

mikehearn commented Aug 31, 2012

Uh oh!

sipa commented Aug 31, 2012

Uh oh!

sipa commented Sep 4, 2012

Uh oh!

sipa commented Sep 20, 2012

Uh oh!

mikehearn commented Sep 20, 2012

Uh oh!

sipa commented Sep 21, 2012

Uh oh!

laanwj commented Sep 21, 2012

Uh oh!

sipa commented Sep 21, 2012

Uh oh!

TheBlueMatt commented Sep 22, 2012

Uh oh!

sipa commented Sep 25, 2012

Uh oh!

mikehearn commented Sep 27, 2012

Uh oh!

sipa commented Sep 28, 2012

Uh oh!

mikehearn commented Sep 29, 2012

Uh oh!

mikehearn commented Sep 29, 2012

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!