Skip to content

Conversation

sipa
Copy link
Member

@sipa sipa commented Apr 20, 2016

Several changes:

  • Make reindex/import use AcceptBlock rather than ProcessNewBlock, so activation of the resulting best chain is delayed.
  • That delayed activation is handled by the existing "Activating best chain" step in the startup process, which is moved to ThreadImport.
  • Optimize ActivateBestChain to not walk the entire chain to find the most work chain to connect for each block (O(n^2) -> O(n)).

This has several advantages:

  • As the actual activation is done after all blocks have been added back to the index, it gets to use the checkpoints information (better progress indicator, skipping of signature checks, ...).
  • Having a block index with many unactivated blocks (for example, after deleting the chainstate directory) becomes much faster
  • That case also runs in the background now (as it's simply treated as part of the importing process).

Downsides:

  • All blocks are read twice from disk during reindex (once to rebuild the index, once to activate).

Todo:

  • Better progress indicator during the index rebuild phase.

@laanwj
Copy link
Member

laanwj commented Apr 20, 2016

Good to see this!

All blocks are read twice from disk during reindex (once to rebuild the index, once to activate)

Could this (theoretically) be avoided by just reading the header during the initial scan? after all, the extra header that we tack on includes the size of the block, so we could skip the rest the first time.

@sipa
Copy link
Member Author

sipa commented Apr 20, 2016

@laanwj Yes, I've considered that. That could lead to an inconsistent block index though (as in: I'm not sure whether there are any internal assumptions that all entries in the block index are valid).

@jonasschnelli
Copy link
Contributor

Nice. Concept ACK.
Testing this now on a mainnet node -reindex.

@sipa
Copy link
Member Author

sipa commented Apr 20, 2016

@laanwj Another problem with that approach: say you get a block corrupted near the chain tip on disk, you find it and do a reindex. You expect it to reindex up to that corrupted block and then start back from network. Instead, if you don't do transaction validation during the rebuild loop, it may happily add the block to the index, and have activation fail (making it mark the chain as invalid).

@laanwj
Copy link
Member

laanwj commented Apr 20, 2016

Yes, it would be quite more complex to handle those issues well.
In any case if the speed-up is more than the extra disk i/o that this generates takes, it doesn't matter, no need to worry about it in this pull.

@sipa
Copy link
Member Author

sipa commented Apr 20, 2016

@jonasschnelli If you feel like benchmarking, an interesting thing to test is (reindex on master) vs (deleting chainstate on this PR), both with signature checks hackishly disabled. Deleting of chainstate caused a very slow, foreground, and (before #7821) uninterruptible process earlier. With this PR is should be (nearly) the same speed as an old reindex.

@sipa
Copy link
Member Author

sipa commented Apr 20, 2016

Would it be useful to have a -weakreindex or -rebuild now, which does not throw out the block index, and only the chainstate? That would be faster, as it wouldn't need the rebuild phase.

@laanwj
Copy link
Member

laanwj commented Apr 20, 2016

Would it be useful to have a -weakreindex or -rebuild now, which does not throw out the block index, and only the chainstate? That would be faster, as it wouldn't need the rebuild phase.

Would need better reporting about what database is corrupted for that to be useful to end-users. But even without that I can see that being very useful for benchmarking the UTXO database.

I like the idea of being able to rebuild databases separately. What about the other way around? If you have a UTXO set, but want to reindex the block chain? (e.g. when switching a pruned node to a non-pruned one by copying block data from another node)

If it was up to me I wouldn't use the name 'weakindex' or 'rebuid' (that's another one for the rescan, reindex, re.... category of vaguenes) but call it what it is: -rebuild-blockindex -rebuild-chainstate etc (using the same name as the db directory names).

@sipa
Copy link
Member Author

sipa commented Apr 20, 2016

I like the idea of being able to rebuild databases separately. What about the other way around? If you have a UTXO set, but want to reindex the block chain? (e.g. when switching a pruned node to a non-pruned one by copying block data from another node)

You can just overwrite your blocks/ database with another version (including its index), as long as it has your currently active chain in it.

@sipa
Copy link
Member Author

sipa commented Apr 20, 2016

If it was up to me I wouldn't use the name 'weakindex' or 'rebuid' (that's another one for the rescan, reindex, re.... category of vaguenes) but call it what it is: -rebuild-blockindex -rebuild-chainstate etc (using the same name as the db directory names).

-rebuild-chainstate sounds like an excellent idea. I don't know what -rebuild-blockindex would mean though...

@sipa
Copy link
Member Author

sipa commented Apr 21, 2016

Added a commit to implement -reindex-chainstate.

I guess the semantics for -reindex-blockindex could be:

  • Rebuild the block index without rebuilding the chainstate
  • If afterwards the chainstate's tip is in the resulting block index, mark every block on the best chain as validated and pretend nothing happened
  • If not, wipe the chainstate and rebuild that as well.

@laanwj
Copy link
Member

laanwj commented Apr 21, 2016

I don't know what -rebuild-blockindex would mean though...

That hypothetical option would rebuild the blockindex database, say after some new-fangled corruption check detects that that database was corrupted.

  • Throw away the block index (so the database in blocks/index)
  • Reindex blocks

But don't touch the chain state in any way, it is assumed to be still good. If the best chain is not included after reindexing the blocks, throw another error that a full reindex -reindex-chainstate is necessary.

I guess the semantics for -reindex-blockindex could be:

Yes, something like that, sounds good to me.

@sipa sipa force-pushed the betterreindex branch 2 times, most recently from 2bd05b3 to 691cbc9 Compare April 21, 2016 12:30
@dcousens
Copy link
Contributor

dcousens commented Apr 22, 2016

concept ACK -(rebuild|reindex)-*

@sipa
Copy link
Member Author

sipa commented Apr 22, 2016

I tried implementing -reindex-blockindex, but it is significantly harder: it would mean first initializing with a dummy chainstate, then reindexing the blockindex while leaving the original chainstate untouched, then stopping the node, loading the real chainstate, marking the blocks along the best chain as valid (or fail if not found anymore), and starting the node again.

@laanwj
Copy link
Member

laanwj commented Apr 26, 2016

Going to test this

@laanwj
Copy link
Member

laanwj commented Apr 26, 2016

Not sure if this is expected behavior, it's probably harmless; but during the "Reindexing block file" initial scan, rev*.dat files are created with size 0:

-rw------- 1 user user         0 Apr 26 15:26 rev00415.dat
-rw------- 1 user user         0 Apr 26 15:26 rev00416.dat
-rw------- 1 user user         0 Apr 26 15:26 rev00417.dat
-rw------- 1 user user         0 Apr 26 15:26 rev00418.dat

Edit: the block counts are off? possibly due to out-of-order blocks not being counted. It looks a bit strange but probably harmless:

2016-04-26 13:30:48 Reindexing block file blk00471.dat...
2016-04-26 13:30:51 Loaded 3 blocks from external file in 2585ms

@laanwj
Copy link
Member

laanwj commented Apr 26, 2016

Did a reindex of my test chain using this patch:

2016-04-26 13:07:36 Reindexing block file blk00000.dat...
2016-04-26 13:32:16 UpdateTip: new best=00000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb6048 height=1 version=0x00000001 log2_work=33.000022 tx=2 date='2009-01-09 02:54:25' progress=0.000000 cache=0.0MiB(1tx)
2016-04-26 17:44:18 UpdateTip: new best=00000000000000000360e6d03330560950285d1062c6f6e6a4558ba1720d6e73 height=407838 version=0x00000004 log2_work=84.507935 tx=123156514 date='2016-04-18 11:34:09' progress=0.994753 cache=43.8MiB(23470tx)

Times:

phase1   00:24:40 (1480s)
phase2   04:12:02 (15122s)
         -----------------
total    04:36:42 (16602s)

For comparison, a sync from a local node on the same machine, under the same conditions, takes 05:32:29 (19949s). That's very good. I have not tried doing a 'traditional' reindex,

Edit: also verified that this fixes #7038. Also tested that interrupting the reindex then restarting works, both in the first phase and second phase.

Edit.2: also did a "traditional" reindex - it took 04:45:49 (17149s). Only a bit longer. Note that this is a fast CPU with 8 cores, so possibly the difference with not validating is less than it would otherwise be.

Edit.3: verified that utxo set hash is the same after old reindex versus new reindex (just to be sure).

// scan for better chains in the block chain database, that are not yet connected in the active best chain
CValidationState state;
if (!ActivateBestChain(state, chainparams))
LogPrintf("Failed to connect best block");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this error be fatal?

@jonasschnelli
Copy link
Contributor

Tested on a mainnet node:

jonasschnelli@bitcoinsrv:~/.bitcoin$ cat debug.log | grep "Bitcoin version\|Reindexing finished\|height=409149"
2016-04-27 14:11:51 Bitcoin version v0.12.99.0-fb682d7 (2016-04-27 15:34:26 +0200)
2016-04-27 14:34:17 Reindexing finished
2016-04-27 18:52:50 UpdateTip: new best=000000000000000000c2963b062bebef4beff8ef38ae5d81e7ccfa8e05abcb1a height=409149 version=0x00000004 log2_work=84.559763 tx=125119626 date='2016-04-27 14:05:06' progress=0.999875 cache=4.0MiB(0tx)

= Times:

phase1    1'346s (00:22:26)
phase2   15'513s (04:18:33)
===========================
total    16'859s (04:40:59)

Full debug log with -debug=reindex http://bitcoinsrv.jonasschnelli.ch/debug.reindex.log (>240MB!)

@sipa
Copy link
Member Author

sipa commented Apr 28, 2016

Added GUI progress reporting

@sipa
Copy link
Member Author

sipa commented May 1, 2016

The two phases in the GUI:

reindexing
processing

If the chainstate is ahead of the block index at startup, you get the same "Processing blocks on disk" now, while the main GUI is running (and not during loading).

@sipa sipa force-pushed the betterreindex branch from b8aeee2 to a0c0058 Compare May 1, 2016 21:52
@pstratem
Copy link
Contributor

utACK fb8fad1

@pstratem
Copy link
Contributor

utACK d3d7547

@pstratem
Copy link
Contributor

very untested ACK b4d24e1 (sanity not guaranteed)

@pstratem
Copy link
Contributor

On a related note access to chainActive immediately after threadGroup.create_thread(boost::bind(&ThreadImport, vImportFiles)); is not protected by cs_main

@laanwj laanwj merged commit b4d24e1 into bitcoin:master May 18, 2016
laanwj added a commit that referenced this pull request May 18, 2016
b4d24e1 Report reindexing progress in GUI (Pieter Wuille)
d3d7547 Add -reindex-chainstate that does not rebuild block index (Pieter Wuille)
fb8fad1 Optimize ActivateBestChain for long chains (Pieter Wuille)
316623f Switch reindexing to AcceptBlock in-loop and ActivateBestChain afterwards (Pieter Wuille)
d253ec4 Make ProcessNewBlock dbp const and update comment (Pieter Wuille)
@maflcko maflcko mentioned this pull request May 22, 2016
16 tasks
str4d added a commit to str4d/zcash that referenced this pull request Feb 23, 2017
sickpig referenced this pull request in sickpig/BitcoinUnlimited Mar 4, 2017
b4d24e1 Report reindexing progress in GUI (Pieter Wuille)
d3d7547 Add -reindex-chainstate that does not rebuild block index (Pieter Wuille)
fb8fad1 Optimize ActivateBestChain for long chains (Pieter Wuille)
316623f Switch reindexing to AcceptBlock in-loop and ActivateBestChain afterwards (Pieter Wuille)
d253ec4 Make ProcessNewBlock dbp const and update comment (Pieter Wuille)
random-zebra added a commit to PIVX-Project/PIVX that referenced this pull request Apr 4, 2021
…tChain + optimization

50dbec5 Add unit tests for signals generated by ProcessNewBlock() (furszy)
f68251d Validation: rename one of the two instances using "bad-prevblk" to its correct description of "prevblk-not-found" (furszy)
a51a755 Fix concurrency-related bugs in ActivateBestChain (Jesse Cohen)
198f435 Do not unlock cs_main in ABC unless we've actually made progress. (Matt Corallo)
8d15cf5 Optimize ActivateBestChain for long chains (Pieter Wuille)
8640be1 Fix fast-shutdown crash if genesis block was not loaded (Matt Corallo)
ef24337 Hold cs_main while calling UpdatedBlockTip() and ui.NotifyBlockTip (Jesse Cohen)
f9d2ab3 Update ValidationInterface() documentation to explicitly specify threading and memory model (Jesse Cohen)
cb9bb25 Update documentation for SingleThreadedSchedulerClient() to specify the memory model (Jesse Cohen)
4ea2048 Add Unit Test for SingleThreadedSchedulerClient (Jesse Cohen)

Pull request description:

  Made some back ports and adaptations to validate further the work introduced in #2203 and #2209.
  Now that the scheduler thread is processing most of the chain events, the callbacks execution order must be verified to dispatch the events serially, ensuring a consistent memory state in each event processing invocation.
  There are some concurrency issues solved, as well a an optimization for larger chains for the ABC workflow. Each commit details them well.

  List:
  bitcoin#7917 (only fb8fad1)
  bitcoin#12988
  bitcoin#13023
  bitcoin#13247
  bitcoin#13835

  This is built on top of #2284 (and the reason that made me notice that problem). So, 2284 comes first, then this one.

ACKs for top commit:
  Fuzzbawls:
    ACK 50dbec5
  random-zebra:
    ACK 50dbec5 and merging...

Tree-SHA512: 573144cc96d2caa49ed2f0887dc8c53b5aca0efd54b6a25626063ccb780da426f56b3c6b7491648e7abefc1960c82b1354a4ff2c51425bfe99adaa4394562608
LarryRuane pushed a commit to LarryRuane/zcash that referenced this pull request May 27, 2021
zkbot added a commit to zcash/zcash that referenced this pull request Aug 9, 2021
Optimise reindex

Backports bitcoin/bitcoin#7917:

> Several changes:
>
> * Make reindex/import use `AcceptBlock` rather than `ProcessNewBlock`,
>   so activation of the resulting best chain is delayed.
>
> * That delayed activation is handled by the existing "Activating best
>   chain" step in the startup process, which is moved to `ThreadImport`.
>
> * Optimize `ActivateBestChain` to not walk the entire chain to find the
>   most work chain to connect for each block (O(n^2) -> O(n)).
>
> This has several advantages:
>
> * As the actual activation is done after all blocks have been added
>   back to the index, it gets to use the checkpoints information (better
>   progress indicator, skipping of signature checks, ...).
>
> * Having a block index with many unactivated blocks (for example, after
>   deleting the chainstate directory) becomes _much_ faster
>
> * That case also runs in the background now (as it's simply treated as
>   part of the importing process).
>
> Downsides:
>
> * All blocks are read twice from disk during reindex (once to rebuild
>   the index, once to activate).
@@ -24,6 +24,7 @@
class CBlockIndex;

static const int64_t nClientStartupTime = GetTime();
static int64_t nLastHeaderTipUpdateNotification = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable seems to be initialized here, but there's nowhere in the code where it's ever updated, and yet it is checked as if the value might change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use more than 5 year old pull requests to ask beginner programming questions.

You can open a new issue, if you are absolutely unable to find the answer yourself.

In this case it is updated via a reference (a reference is like a pointer).

@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants