-masterdatadir for datadir bootstrapping #13746

kallewoof · 2018-07-23T09:55:33Z

This PR adds the ability to bootstrap another node from an existing datadir.

To fire up a new node with access to an existing node's datadir y, you would simply do

$ bitcoind -datadir=x -masterdatadir=y

(presumably permanently in bitcoin.conf) and it would use the blocks files from the former up until it needed to change something, at which point it would copy. (See accompanying test.)

It currently only supports blocks/blk* caching, which is implemented as copy-on-write. Due to limitations in LevelDB, the chainstate and blocks/index dirs are copied to -datadir on first launch. Latter is only ~75 MB, but former is ~4 GB, so this is sad but better than nothing.

Note: the feature_masterdatadir.py test makes ~150 MB of block data. May want to combine this with the pruning tests to not redo the work of mining, but it is in its own test for now. Also may wanna move it to extended, but keeping it in base while this PR is ongoing.

laanwj · 2018-07-23T10:22:00Z

Concept ACK!

promag

What if there is a node (A) running in the master data dir of node (B) and A is writing a block at the same time B is reading?

promag · 2018-07-23T10:40:37Z

src/validation.cpp

@@ -226,6 +226,7 @@ std::atomic_bool fImporting(false);
 std::atomic_bool fReindex(false);
 bool fHavePruned = false;
 bool fPruneMode = false;
+bool has_master_datadir = false;


Use prefix g_, g_has_master_datadir.

promag · 2018-07-25T14:00:09Z

src/validation.cpp

+            uint8_t buffer[16384];
+            while (!feof(mfile)) {
+                size_t r = fread(buffer, 1, 16384, mfile);
+                if (r) fwrite(buffer, 1, r, file);


Should handle failure instead of ignoring? Could lead to corrupt blocks?

promag · 2018-07-25T14:00:30Z

test/functional/feature_masterdatadir.py

@@ -0,0 +1,88 @@
+#!/usr/bin/env python3
+# Copyright (c) 2014-2017 The Bitcoin Core developers


DrahtBot · 2018-07-25T15:00:47Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#15118 (Refactor block file logic by jimpo)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

kallewoof · 2018-07-26T02:24:06Z

@promag

What if there is a node (A) running in the master data dir of node (B) and A is writing a block at the same time B is reading?

I'm not sure what the chances are that this results in actual corruption. The master is safe always, but the slave may end up with a copy that has incomplete data. That depends on whether the master flushes the block file atomically or not, though. I would expect it makes an effort to always keep files in a clean state, in case of crashes and such.

That said, the block files are checked on startup and I expect a corruption would be detected almost immediately, since it is almost guaranteed to be the newest block file.

I'll experiment a bit with the 'open existing file as readonly' by leaving out the last 1 byte and seeing how it behaves.

kallewoof · 2018-07-26T02:41:22Z

Experiment: skip last byte for case "!readonly & copy existing":

diff --git a/src/validation.cpp b/src/validation.cpp
index b5646f107..7af6956b3 100644
--- a/src/validation.cpp
+++ b/src/validation.cpp
@@ -3804,6 +3804,11 @@ static FILE* OpenDiskFile(const CDiskBlockPos &pos, const char *prefix, bool fRe
             while (!feof(mfile)) {
                 size_t r = fread(buffer, 1, 16384, mfile);
                 if (r) {
+                    // TEST: CORRUPTION
+                    if (r > 0 && r < 16384) {
+                        LogPrintf("*** Corrupting file %s by writing %d/%d bytes from master\n", path.string(), r-1, r);
+                        r--;
+                    }
                     if (r != fwrite(buffer, 1, r, file)) {
                         fclose(mfile);
                         fclose(file);

Result: slave node crashes with a 'corrupt block database' error:

2018-07-26T02:35:14Z Loaded best chain: hashBestChain=00000000000000000029ac6548add211c4f3d20d20da3d5858c3eda1fb84fece height=529951 date=2018-06-30T22:01:25Z progress=0.977126
2018-07-26T02:35:14Z init message: Rewinding blocks...
2018-07-26T02:35:16Z *** Corrupting file /Users/karljohan-alm/workspace/bitcoin-kw/src/slavedata/blocks/blk01301.dat by writing 5744/5745 bytes from master
2018-07-26T02:35:17Z init message: Verifying blocks...
2018-07-26T02:35:17Z Verifying last 6 blocks at level 3
2018-07-26T02:35:17Z [0%]...ERROR: ReadBlockFromDisk: Deserialize or I/O error - CAutoFile::read: end of file: unspecified iostream_category error at CBlockDiskPos(nFile=1301, nPos=97518532)
2018-07-26T02:35:17Z ERROR: VerifyDB(): *** ReadBlockFromDisk failed at 529951, hash=00000000000000000029ac6548add211c4f3d20d20da3d5858c3eda1fb84fece
2018-07-26T02:35:17Z : Corrupted block database detected.
Please restart with -reindex or -reindex-chainstate to recover.
: Corrupted block database detected.
Please restart with -reindex or -reindex-chainstate to recover.
2018-07-26T02:35:17Z Aborted block database rebuild. Exiting.
2018-07-26T02:35:17Z Shutdown: In progress...
2018-07-26T02:35:17Z scheduler thread interrupt
2018-07-26T02:35:17Z Shutdown: done

Disabling corruption patch and restarting results in another crash. Only fix was to -reindex, which initiated a full redownload of blocks (master was pruned).

I think it would be interesting to patch the 'corrupt!' logic with a fallback to grab master file again before failing. Actually, I think I'm going to do that.

Edit: I did some testing, and the above idea to copy again does not work, because the only time it copies is when it is about to write something, and it does not check the block file consistency at this time. I.e. it gets a corrupt file, writes to it, closes it, then later opens it, finds that it's corrupted, recopies it, but now it has lost the data it wrote before.

jonasschnelli · 2018-07-30T15:54:51Z

src/init.cpp

@@ -1256,6 +1260,27 @@ bool AppInitMain()
            gArgs.GetArg("-datadir", ""), fs::current_path().string());
    }

+    if (g_has_master_datadir) {
+        // LDB does not support readonly access, so we need to copy the entire


nit: s/LDB/levelDB

jonasschnelli · 2018-07-30T15:57:20Z

src/init.cpp

+        // LDB does not support readonly access, so we need to copy the entire
+        // chainstate and blocks/index dirs over to the new location, if not found already
+        fs::path path = GetDataDir() / "chainstate";
+        if (!fs::exists(path)) {


Doesn't this copy action require a protection that the masterdatadir is currently not in use/write? If a Bitcoin Core instance is using this (master) datadir it may result in a corrupted copy?

Yes. Master is guaranteed to be safe as no write operation is ever done from the slave, but the opposite is not the case. See #13746 (comment) for experiment on corruption. There's no good way to recover, as the slave will have already written data and closed the file before realizing it was truncated.

The question remains how frequent this kind of race condition is, though.

jonasschnelli · 2018-07-30T15:57:33Z

Concept ACK

kallewoof · 2018-07-31T06:23:30Z

Actually, I realized that it should be possible to make a perfectly safe slave instance by checking the master directory for the last block file in existence, and marking that minus 1 as the last available master block file.

I.e. if master has blk000 ~ blk500, slave will use files blk000 ~ blk499 from master, copy blk500 over the first time it starts up, and then make its own files from there.

Important to note is that it will not, for example, try to use master's blk501, in the case where slave is shutdown and master keeps chugging along to the point where it makes a new block file before the slave, something this PR does not actually address.

Still means master should preferably be stopped during initial start-up of slave, but it can run safely without slave risking corruption after slave has finished first startup.

jonasschnelli · 2018-07-31T11:12:34Z

I'm more worried about hot copying the levelDB part (chainstate/). Imo to be safe, the master must close the leveldb environment.

sipa · 2018-07-31T22:40:35Z

You can't do this while an instance is running on the master (unless you can somehow force it to temporarily close its database environment and pause operations). If you don't need it to work while the master is running you should grab the lock on the master to avoid it.

Alternatively you can just not copy databases at all, and require a reindex to setup a second datadir.

kallewoof · 2018-08-01T09:46:08Z

@sipa I am now locking master temporarily while doing the copying, and init-error-shutdown if it is already locked.

I am also now tracking the last block file that the slave relies on in master, so it won't accidentally start reading from master block files if master runs ahead in file count (e.g. slave writes blk123, master then creates blk124 and slave starts trying to read from it).

@jonasschnelli

I'm more worried about hot copying the levelDB part (chainstate/). Imo to be safe, the master must close the leveldb environment.

This should be addressed now that the master is locked before copy.

@promag

What if there is a node (A) running in the master data dir of node (B) and A is writing a block at the same time B is reading?

The slave will now copy the very last block file over on initial start. It also holds a lock on the master dir for the duration of these operations, so your concerns have been addressed, I think.

…s used Also find and track the last blk/rev files that we rely on from master, as we do not want to load anything beyond that, in case master runs ahead of us at some point.

…es not exist

DrahtBot · 2019-03-02T22:38:23Z

Needs rebase

practicalswift · 2019-03-03T09:32:48Z

Concept ACK

kallewoof force-pushed the masterdatadir branch 2 times, most recently from 518a108 to 3ef0f47 Compare July 23, 2018 10:03

laanwj added the Feature label Jul 23, 2018

DrahtBot added the Needs rebase label Jul 24, 2018

kallewoof force-pushed the masterdatadir branch from 3ef0f47 to ed4e390 Compare July 25, 2018 07:38

DrahtBot removed the Needs rebase label Jul 25, 2018

promag reviewed Jul 25, 2018

View reviewed changes

DrahtBot mentioned this pull request Jul 25, 2018

[WIP] Full unicode support on Windows #13755

Closed

3 tasks

This was referenced Jul 25, 2018

bitcoin-cli: -stdinwalletpassphrase and non-echo stdin passwords #13716

Merged

Check for datadir after the config files were read #13621

Closed

kallewoof force-pushed the masterdatadir branch from ed4e390 to 79af986 Compare July 26, 2018 02:11

This was referenced Jul 26, 2018

[bugfix] Fix encoding issue for Windows #13426

Closed

Docs: Create default bitcoin.conf file on startup #13761

Closed

Test for Windows encoding issue #13787

Merged

jonasschnelli reviewed Jul 30, 2018

View reviewed changes

kallewoof force-pushed the masterdatadir branch from 79af986 to 25acb6f Compare July 30, 2018 17:07

kallewoof force-pushed the masterdatadir branch 4 times, most recently from 2609bc9 to 65ce137 Compare August 1, 2018 09:43

kallewoof force-pushed the masterdatadir branch 8 times, most recently from d3cbc0c to e103bb3 Compare January 23, 2019 07:39

DrahtBot added the Needs rebase label Feb 4, 2019

kallewoof force-pushed the masterdatadir branch from e103bb3 to 8aec15f Compare February 7, 2019 16:20

DrahtBot removed the Needs rebase label Feb 7, 2019

DrahtBot added the Needs rebase label Feb 21, 2019

kallewoof added 9 commits February 22, 2019 11:35

util: add CopyDirectory helper

b1334d4

util: add master datadir path support

f3f7c49

util: add ObtainDirectoryLock() for temporary dir locks

96a4cb2

validation: add g_has_master_datadir and g_master_endblock

3f7f084

init: add -masterdatadir arg

d02569a

init: copy chainstate and blocks/index to datadir if -masterdatadir i…

d73d977

…s used Also find and track the last blk/rev files that we rely on from master, as we do not want to load anything beyond that, in case master runs ahead of us at some point.

validation: use or copy master blocks file when local datadir file do…

ebb0b3b

…es not exist

test: basic -masterdatadir tests

b28076d

linter: add exception to locale dependent function linter

d88e6c3

kallewoof force-pushed the masterdatadir branch from 8aec15f to d88e6c3 Compare February 22, 2019 02:46

DrahtBot removed the Needs rebase label Feb 22, 2019

DrahtBot added the Needs rebase label Mar 2, 2019

kallewoof closed this Apr 3, 2019

laanwj removed the Needs rebase label Oct 24, 2019

kallewoof deleted the masterdatadir branch January 4, 2020 02:20

bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022

		@@ -0,0 +1,88 @@
		#!/usr/bin/env python3
		# Copyright (c) 2014-2017 The Bitcoin Core developers

-masterdatadir for datadir bootstrapping #13746

-masterdatadir for datadir bootstrapping #13746

Uh oh!

Conversation

kallewoof commented Jul 23, 2018

Uh oh!

laanwj commented Jul 23, 2018

Uh oh!

promag left a comment

Choose a reason for hiding this comment

Uh oh!

promag Jul 23, 2018

Choose a reason for hiding this comment

Uh oh!

promag Jul 25, 2018

Choose a reason for hiding this comment

Uh oh!

promag Jul 25, 2018

Choose a reason for hiding this comment

Uh oh!

DrahtBot commented Jul 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Conflicts

Uh oh!

kallewoof commented Jul 26, 2018

Uh oh!

kallewoof commented Jul 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonasschnelli Jul 30, 2018

Choose a reason for hiding this comment

Uh oh!

jonasschnelli Jul 30, 2018

Choose a reason for hiding this comment

Uh oh!

kallewoof Jul 30, 2018

Choose a reason for hiding this comment

Uh oh!

jonasschnelli commented Jul 30, 2018

Uh oh!

kallewoof commented Jul 31, 2018

Uh oh!

jonasschnelli commented Jul 31, 2018

Uh oh!

sipa commented Jul 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kallewoof commented Aug 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Mar 2, 2019

Uh oh!

practicalswift commented Mar 3, 2019

Uh oh!

Uh oh!

DrahtBot commented Jul 25, 2018 •

edited

Loading

kallewoof commented Jul 26, 2018 •

edited

Loading

sipa commented Jul 31, 2018 •

edited

Loading

kallewoof commented Aug 1, 2018 •

edited

Loading