-
Notifications
You must be signed in to change notification settings - Fork 37.8k
Build tx index in parallel with validation #11857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -350,6 +351,10 @@ static bool rest_tx(HTTPRequest* req, const std::string& strURIPart) | |||
if (!ParseHashStr(hashStr, hash)) | |||
return RESTERR(req, HTTP_BAD_REQUEST, "Invalid hash: " + hashStr); | |||
|
|||
if (g_txindex) { | |||
g_txindex->BlockUntilSyncedToCurrentChain(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of blocking, give 503?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really unavailable, just that the background process needs to catch up, which should happen quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not check the return of BlockUntilSyncedToCurrentChain and maybe always return an error before its even gotten close to caught up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needs to be addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/init.cpp
Outdated
@@ -1426,9 +1433,9 @@ bool AppInitMain(boost::thread_group& threadGroup, CScheduler& scheduler) | |||
|
|||
if (fRequestShutdown) break; | |||
|
|||
// LoadBlockIndex will load fTxIndex from the db, or set it if | |||
// LoadBlockIndex will load fHavePruned if we've ever removed a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment incomplete or outdated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed.
High-Level Concept ACK. As for your two notes:
I'd vote strongly for this. Keeping them separate is good.
Yes, I think you should just do this. I'd like to move the CValidationInterface semantics to no longer be a single thread run on the scheduler but instead multiple scheduler threads that process events and only guarantee order for individual clients instead of globally. |
ae276ce
to
41022ec
Compare
src/index/txindex.cpp
Outdated
|
||
void TxIndex::Start() | ||
{ | ||
RegisterValidationInterface(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move after Init?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this has to be registered before m_synced
is set to true in Init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "[index] Create new TxIndex class."
No, this has to be registered before m_synced is set to true in Init.
Definitely worth noting this in a code comment. Also maybe m_synced should be called m_initialized or m_started to be clearer that it changes from false to true just once on startup, and isn't updated in an ongoing way.
7b8a36d
to
e95ed5a
Compare
Concept ACK
I agree that would be preferable (though not necessarily in this PR, I don't think the scope here should be extended further). The transaction index has a completely different access pattern from the block index. This came up in #10922 and other places. Also for safety and flexibility it would be good to have separate indexers, and to not have it integrated into consensus-critical block handling. |
I'm not actaully sure that its unrelated - currently txindex is required to be kept in-sync and be present in the block tree DB before the coins for a block are flushed to the utxodb. Moving it to the background without bending over backwards to keep things in-sync would result in "corrupt" (I assume just missing entries) tx index in some cases on downgrade. Its not a huge deal, but I think moving the txindex into a separate DB would be really nice to do before (or in the same PR) things are background-flushed. |
@TheBlueMatt @laanwj I updated this PR to split the database and added a migration. The migration took ~103 min on an AWS m4.large with data on a gp2 EBS volume (non-local SSD). So that's kind of painful. I could change the migration to happen in the background thread, but the UX might be weird for people upgrading because the RPC endpoint would report that the txindex is catching up. Open to suggestions here. |
Seems to be missing migration (forget to push?). Doing migration in the background after start seems fine, better to wait with most things working than it just hanging startup, just have to make sure it doesn't overwrite new entries or you'd end up pointing to a reorg'd-out block's copy (though I think thats probably not technically a problem, it seems strange). As for threading, it would be really nice to have the flushing for new blocks happen directly in the scheduler/validation interface threading, with the upgrade either in the init thread at the end (it just hangs around for the entire program's execution waiting for shutdown to start, so as long as the migration is interruptible that'd be a fine place for it) or a new thread. |
@TheBlueMatt Migration is here: a252d46#diff-81e4f16a1b5d5b7ca25351a63d07cb80R446. Called from I also changed the new block flushing to happen directly in the |
src/txdb.cpp
Outdated
CompactRange(prev_key_newdb, key); | ||
prev_key_newdb = key; | ||
} | ||
if (batch_olddb.SizeEstimate() > batch_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separating these if statements loses coherency - if we crash we may have a bogus txindex and no way to detect it. Should likely instead do if (batch_newdb.SizeEstimate() + batch_olddb.SizeEstimate() > batch_size)........
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
6090790
to
35da0d2
Compare
015cdd6
to
48fffcf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I reviewed most of this but rushed through the last few commits and want to look more closely before adding ack.
src/txdb.cpp
Outdated
bool TxIndexDB::WriteTxns(const std::vector<std::pair<uint256, CDiskTxPos>>& v_pos) | ||
{ | ||
CDBBatch batch(*this); | ||
for (auto tuple : v_pos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "[db] Create separate database for txindex."
Maybe change to const auto& tuple
to avoid a copy while iterating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/txdb.cpp
Outdated
LogPrintf("Upgrading txindex database...\n"); | ||
LogPrintf("[0%%]..."); | ||
int report_done = 0; | ||
size_t batch_size = 1 << 24; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "[db] Create separate database for txindex."
Maybe declare const and add comment (16MiB).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/txdb.cpp
Outdated
break; | ||
} | ||
|
||
if (!pcursor->GetKey(key)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "[db] Create separate database for txindex."
Maybe log an error in this case. I don't think it would be expected for pcursor->Valid() to return true but pcursor->GetKey to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right that this should never happen, so I opted to return an error instead of logging and completing the migration.
src/txdb.cpp
Outdated
WriteBatch(batch_newdb); | ||
CompactRange(begin_key, key); | ||
block_tree_db.WriteBatch(batch_olddb); | ||
block_tree_db.CompactRange(begin_key, key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "[db] Create separate database for txindex."
This is duplicating code inside the for loop, and also skipping the last compact range on the old db. Maybe just flush and compact once at the top of the loop:
for (pcursor->Seek(begin_key);; pcursor->Next()) {
if (!pcursor->Valid() || batch_newdb.SizeEstimate()...) {
...flush and compact...
}
if (!pcursor->Valid()) break;
...move entry at cursor...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the logic gets kind of tricky since there's also the early break if the cursor is valid but iterates past the DB_TXINDEX range. Also the CompactRange after the loop compacts over the entire range, not just from the previous batch write point. What do you mean that it skips the last compact range on the old DB? It shouldn't.
This code was mostly copied from CCoinsViewDB::Upgrade
if that wasn't clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "[db] Create separate database for txindex."
Hmm, the logic gets kind of tricky since there's also the early break.
This doesn't seem that bad in light of the code duplication in the current version of this code, and the fact that the originally duplicated code had a bug that needed to be fixed two places instead of one (missing the "Sync new DB changes to disk before deleting from old DB" behavior).
I think adding new done
bool variable set to as !pcursor->Valid() || key.first != DB_TXINDEX
, would make the code change I suggested above even shorter and clearer.
What do you mean that it skips the last compact range on the old DB? It shouldn't.
My mistake, I don't think I saw the actual line of code I was commenting on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried a few things, and the approach you describe is not worth the additional control flow complexity IMO. Consider also that the CompactRange calls are over a different range inside and outside of the loop (as noted above) and the newest version of the code adds an additional write to the batch outside of the loop to write the block hash.
However, I understand the argument against duplication, so I instead added a private method extracting the logic:
static void WriteTxIndexMigrationBatches(TxIndexDB& newdb, CBlockTreeDB& olddb,
CDBBatch& batch_newdb, CDBBatch& batch_olddb,
const std::pair<unsigned char, uint256>& begin_key,
const std::pair<unsigned char, uint256>& end_key)
{
...
}
How do you feel about this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about this approach?
Better, I'm still not sure why my suggestion doesn't work out, but this takes care of my concern.
src/txdb.cpp
Outdated
|
||
bool TxIndexDB::MigrateData(CBlockTreeDB& block_tree_db) | ||
{ | ||
// The prior implementation of txindex was always in sync with block index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "[db] Create separate database for txindex."
Unclear why migration should be tied to txindex flag. At first glance, it would seem simpler to reason about possible states and also more robust if the code just always moved DB_TXINDEX entries from the old location to the new location independent of the flag. Maybe add a code comment here to clarify rationale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I elaborated on the comment. I think this is the best way to determine whether a migration is necessary. LMK if you think it needs further explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "[db] Create separate database for txindex."
Unclear why migration should be tied to txindex flag. At first glance, it would seem simpler to reason about possible states and also more robust if the code just always moved DB_TXINDEX entries from the old location to the new location independent of the flag. Maybe add a code comment here to clarify rationale.
I elaborated on the comment. I think this is the best way to determine whether a migration is necessary. LMK if you think it needs further explanation.
This still doesn't seem to be saying why the flag needs to be checked. Maybe add "Do not migrate data if the txindex flag is false, because any data that is found will be stale and not in sync with DB_BEST_BLOCK" to the comment if that's the explanation.
src/index/txindex.cpp
Outdated
CDiskTxPos pos(pindex->GetBlockPos(), GetSizeOfCompactSize(block.vtx.size())); | ||
std::vector<std::pair<uint256, CDiskTxPos>> vPos; | ||
vPos.reserve(block.vtx.size()); | ||
for (auto tx : block.vtx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "[index] Create new TxIndex class."
Could const auto&
to avoid a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/index/txindex.cpp
Outdated
vPos.push_back(std::make_pair(tx->GetHash(), pos)); | ||
pos.nTxOffset += ::GetSerializeSize(*tx, SER_DISK, CLIENT_VERSION); | ||
} | ||
return m_db->WriteTxns(vPos) && m_db->WriteBestBlockHash(pindex->GetBlockHash()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "[index] Create new TxIndex class."
Consider moving WriteBestBlockHash call out to caller so the stored best block and m_best_block_index get updated together. It's a little unexpected to see one value written without the other. Might want to rename this method to WriteBlockTxns
if you do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of prefer to keep the DB writes together. I'd hoped that the access pattern would make it clear:
if (WriteBlock(*block, pindex)) {
m_best_block_index = pindex;
}
src/index/txindex.cpp
Outdated
|
||
void TxIndex::Start() | ||
{ | ||
RegisterValidationInterface(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "[index] Create new TxIndex class."
No, this has to be registered before m_synced is set to true in Init.
Definitely worth noting this in a code comment. Also maybe m_synced should be called m_initialized or m_started to be clearer that it changes from false to true just once on startup, and isn't updated in an ongoing way.
src/txdb.cpp
Outdated
@@ -29,6 +29,7 @@ static const char DB_HEAD_BLOCKS = 'H'; | |||
static const char DB_FLAG = 'F'; | |||
static const char DB_REINDEX_FLAG = 'R'; | |||
static const char DB_LAST_BLOCK = 'l'; | |||
static const char DB_TXINDEX_BEST_BLOCK = 'T'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "[index] Create new TxIndex class."
Why use a different code for DB_TXINDEX_BEST_BLOCK than DB_BEST_BLOCK? Why even define a new constant at all? It seems strange that the new txindex format would diverge unnecessarily from the old format here when it isn't doing do that in other places. Should add a code comment explaining if there is a reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that wasn't even used. Left over from a previous version where the database wasn't split out.
src/index/txindex.cpp
Outdated
} | ||
} else { | ||
if (best_block_index->GetAncestor(pindex->nHeight - 1) != pindex->pprev) { | ||
FatalError("%s: Block %s does not connect to an ancestor of known best chain (tip=%s)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "[index] Create new TxIndex class."
I'm not sure this case and the one above should be errors given that BlockConnected calls are queued up in the notification thread and may not be up to date with chainActive (which is what ThreadSync syncs toward). Seems like it would be right (and simpler) to just call WriteBlock unconditionally here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't checking against chainActive, it is just asserting that BlockConnected gets called with blocks in order (even if chainActive is ahead). It's important to note that if the BlockConnected callbacks are running, then m_synced
is true an ThreadSync
has exited. I added more comments to these fields/methods in the header file to hopefully make that more clear.
48fffcf
to
1d2c76a
Compare
Needs rebase (probably a simple one, only init.cpp conflicted). |
1d2c76a
to
c4401fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK. It's very nice to able to add and remove txindex without a full reindex, which takes weeks on my Bitseed node. Worthy of a Performance label?
It tested (with testnet3) adding txindex=1
on this branch, as well as adding txindex=1
on master and then migrating using this branch.
Possibly unrelated, but upon switching back to master
, deleting /indexes
, launching src/qt/bitcoin-qt -txindex=1
, confirming that I wanted to do a reindex, it failed with "Error opening block database". I then run bitcoind -txindex=1 -reindex
for a minute, closed it and launched QT again, at which point indexing proceeded as expected.
In terms of storage something seems off (testnet3):
- chainstate without txindex: 968.7 MB
- chainstate with "legacy" txindex: 966.8 MB
- chainstate on this branch, after migration: 986.4 MB
- indexes, after migration: 870 MB
- indexes, without migration: 873 MB
So it looks like txindex only adds 30 MB on master, but the separate indexes directory adds almost 800 MB. Or I'm doing something wrong.
Can you expand the progress logger beyond migration? It's also useful for when txindex=1
is added after IBD.
Would be nice to have functional tests for the various getrawtransaction
error messages.
It's a bit confusing to call the directory /indexes
when the block index file isn't there, but maybe that can be moved later (or never, if it's too risky).
} else if (!g_txindex) { | ||
errmsg = "No such mempool transaction. Use -txindex to enable blockchain transaction queries"; | ||
} else if (!f_txindex_ready) { | ||
errmsg = "No such mempool transaction. Blockchain transactions are still in the process of being indexed"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have a distinct error code, rather than RPC_INVALID_ADDRESS_OR_KEY
? If I understand correctly, an RPC consumer should wait a little and try again if this happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right, an RPC consumer should wait until the index is built. Using the RPC_INVALID_ADDRESS_OR_KEY
error code for the "No such mempool transaction. Use -txindex to enable blockchain transaction queries"
clause also seems odd to me.
Perhaps the RPC_IN_WARMUP
makes sense here, or perhaps a new RPC_INDEX_UNAVAILABLE
error code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A distinct error code may make sense, though no opinion on what. Further, I'm not sure we should let GetTransaction call into the g_txindex if !f_txindex_ready (as it may return data from a stale block, but its also just generally not so useful). Maybe GetTransaction/FindTx can check if we're in sync yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's best effort, just like the lookup from CCoinsCacheView. If there's a response, great, otherwise the error message says that the not found may be inclusive. I agree that's kind of a weird behavior for the RPC, but if we change it, I could make the same argument for just removing the CCoinsCacheView lookup.
break; | ||
} | ||
|
||
// Log progress every 10%. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it really takes 103 minutes on a fast EC2 instance, maybe make it 1%? Or once a minute? QT already shows 1% intervals during the upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The percentage shown in the UI is updated with every percent, it's just the log line that shows increments of 10, as it appends to the same line: [0%]...[10%]...[20%...]
. Alternatively, it could log progress on a separate line every 30 seconds or something like: "Upgrading txindex database: n% complete\n". I kind of prefer that approach, but I copied how it was done for CCoinsViewDB upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted to log every 30s.
I also noticed that I was able to reproduce the "Error opening block database" QT error (on master) after a fresh reindex (on this branch). No need to use bitcoind to make it go away, simply launch with This also seems to happens on master so might be unrelated:
|
I was comparing the wrong directory sizes above. Also the error I was seeing was unrelated, and hopefully fixed in #12401 |
342aeac
to
f1b8b8d
Compare
Now that the transaction index is updated asynchronously, in order to preserve the current behavior of public interfaces, the code blocks until the transaction index is caught up with the current state of the blockchain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 523dd76 despite the comments. I think its correct as-is now, but there's a few things that I think are rough API edges that would be nice to get cleanup on.
@@ -22,6 +22,7 @@ static const char DB_COIN = 'C'; | |||
static const char DB_COINS = 'c'; | |||
static const char DB_BLOCK_FILES = 'f'; | |||
static const char DB_TXINDEX = 't'; | |||
static const char DB_TXINDEX_BLOCK = 'T'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a better name here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest one.
batch_olddb.Erase(key); | ||
|
||
if (batch_newdb.SizeEstimate() > batch_size || batch_olddb.SizeEstimate() > batch_size) { | ||
WriteTxIndexMigrationBatches(*this, block_tree_db, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing a batch deleting the thing the iterator is pointing to before calling Next() just feels broken to me. It should be fine cause iterators appear to always take a snapshot, but could you at least add a comment noting that we rely on this being safe (and also do in CCoinsViewDB::Upgrade).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like overkill to me to leave a comment for documented LevelDB behavior, but sure.
|
||
// Read might have failed either because key does not exist or due to an error. | ||
// If the former, return value should still be true. | ||
if (!Exists(DB_BEST_BLOCK)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think this is right - if Read fails due to a DB error it will throw a dbwrapper_error, so we'll never get here anyway. It also makes the return value super confusing. Can we just make it have return semantics the same as every other DB function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it, but I think the return value of CDBWrapper::Read()
is super confusing on it's own. It means either that the key was not found or that the value at that key could not be deserialized into the provided struct (and that the return value is now in an unspecified state).
return false; | ||
} | ||
|
||
m_best_block_index = FindForkInGlobalIndex(chainActive, locator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to check that the genesis block matches (if locator was non-null) as the locator will always have it. Not a big deal since users shouldnt be moving the txinded db to anothter datadir, but...users love to do crazy things and break things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, what's the right behavior in that case? Seems to me like it would be to sync the txindex from genesis, which is exactly what this does.
auto& consensus_params = Params().GetConsensus(); | ||
|
||
int64_t last_log_time = 0; | ||
while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be really nice to write the locator out at various points during this loop, as otherwise if you start a txindex sync and then kill your bitcoind while it takes its hour building the txindex you'll have to start against from scratch on next startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, done.
@@ -350,6 +351,10 @@ static bool rest_tx(HTTPRequest* req, const std::string& strURIPart) | |||
if (!ParseHashStr(hashStr, hash)) | |||
return RESTERR(req, HTTP_BAD_REQUEST, "Invalid hash: " + hashStr); | |||
|
|||
if (g_txindex) { | |||
g_txindex->BlockUntilSyncedToCurrentChain(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needs to be addressed.
Closing and re-opening in #13033 because this has stopped loading for people. |
9b27047 [doc] Include txindex changes in the release notes. (Jim Posen) ed77dd6 [test] Simple unit test for TxIndex. (Jim Posen) 6d772a3 [rpc] Public interfaces to GetTransaction block until synced. (Jim Posen) a03f804 [index] Move disk IO logic from GetTransaction to TxIndex::FindTx. (Jim Posen) e0a3b80 [validation] Replace tx index code in validation code with TxIndex. (Jim Posen) 8181db8 [init] Initialize and start TxIndex in init code. (Jim Posen) f90c3a6 [index] TxIndex method to wait until caught up. (Jim Posen) 70d510d [index] Allow TxIndex sync thread to be interrupted. (Jim Posen) 94b4f8b [index] TxIndex initial sync thread. (Jim Posen) 34d68bf [index] Create new TxIndex class. (Jim Posen) c88bcec [db] Migration for txindex data to new, separate database. (Jim Posen) 0cb8303 [db] Create separate database for txindex. (Jim Posen) Pull request description: I'm re-opening #11857 as a new pull request because the last one stopped loading for people ------------------------------- This refactors the tx index code to be in it's own class and get built concurrently with validation code. The main benefit is decoupling and moving the txindex into a separate DB. The primary motivation is to lay the groundwork for other indexers that might be desired (such as the [compact filters](bitcoin/bips#636)). The basic idea is that the TxIndex spins up its own thread, which first syncs the txindex to the current block index, then once in sync the BlockConnected ValidationInterface hook writes new blocks. ### DB changes At the suggestion of some other developers, the txindex has been split out into a separate database. A data migration runs at startup on any nodes with a legacy txindex. Currently the migration blocks node initialization until complete. ### Open questions - Should the migration of txindex data from the old DB to the new DB block in init or should it happen in a background thread? The downside to backgrounding it is that `getrawtransaction` would return an error message saying the txindex is syncing while the migration is running. ### Impact In a sample size n=1 test where I synced nodes from scratch, the average time [Index writing](https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L1903) was 3.36ms in master and 1.72ms in this branch. The average time between `UpdateTip` log lines for sequential blocks between 400,000 and IBD end on mainnet was 0.297204s in master and 0.286134s in this branch. Most likely this is just variance in IBD times, but I can try with some more trials if people want. Tree-SHA512: 451fd7d95df89dfafceaa723cdf0f7b137615b531cf5c5035cfb54e9ccc2026cec5ac85edbcf71b7f4e2f102e36e9202b8b3a667e1504a9e1a9976ab1f0079c4
9cb6cdc Simplify semantics of ChainStateFlushed callback (Matt Corallo) 50b6533 scripted-diff: Rename SetBestChain callback ChainStateFlushed (Matt Corallo) Pull request description: Previously, ChainStateFlushed would fire either if a full flush completed (which can happen due to memory limits, forced flush, or on its own DATABASE_WRITE_INTERVAL timer) *or* on a ChainStateFlushed-specific DATABASE_WRITE_INTERVAL timer. This is both less clear for clients (as there are no guarantees about a flush having actually happened prior to the call), and reults in extra flushes not clearly intended by the code. We drop the second case, providing a strong guarantee without removing the periodit timer-based flushing. This is a follow-up to discussion in #11857. Tree-SHA512: 22ba3a0954d265d28413dbf87040790ca5b439820ee7bbadab14028295ec190de82ce5cd664426c82e58b706dc84278868026fa8d066702eb6e6962c9ace1f8e
…te LevelDB memory use 741f017 Add DynamicMemoryUsage() to LevelDB (Evan Klitzke) Pull request description: This adds a new method `CDBWrapper::DynamicMemoryUsage()` similar to Bitcoin's existing methods of the same name. It's implemented by asking LevelDB for the information, and then parsing the string response. I've also added logging to `CDBWrapper::WriteBatch()` to track this information: ``` $ tail -f ~/.bitcoin/testnet3/debug.log | grep WriteBatch 2018-03-05 19:34:55 WriteBatch memory usage: db=chainstate, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:17 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:17 WriteBatch memory usage: db=chainstate, before=0.0MiB, after=8.0MiB 2018-03-05 19:35:22 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:22 WriteBatch memory usage: db=chainstate, before=8.0MiB, after=17.0MiB 2018-03-05 19:35:26 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:27 WriteBatch memory usage: db=chainstate, before=9.0MiB, after=18.0MiB 2018-03-05 19:35:40 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:41 WriteBatch memory usage: db=chainstate, before=9.0MiB, after=7.0MiB 2018-03-05 19:35:52 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:52 WriteBatch memory usage: db=chainstate, before=7.0MiB, after=9.0MiB ^C ``` As LevelDB doesn't seem to provide a way to get the database name, I've also added a new `m_name` field to the `CDBWrapper`. This is necessary because we have multiple LevelDB databases (two now, and possibly more later, e.g. bitcoin#11857). I am using this information in other branches where I'm experimenting with changing LevelDB buffer sizes. Tree-SHA512: 7ea8ff5484bb07ef806af17d000c74ccca27d2e0f6c3229e12d93818f00874553335d87428482bd8acbcae81ea35aef2a243326f9fccbfac25989323d24391b4
…te LevelDB memory use 741f017 Add DynamicMemoryUsage() to LevelDB (Evan Klitzke) Pull request description: This adds a new method `CDBWrapper::DynamicMemoryUsage()` similar to Bitcoin's existing methods of the same name. It's implemented by asking LevelDB for the information, and then parsing the string response. I've also added logging to `CDBWrapper::WriteBatch()` to track this information: ``` $ tail -f ~/.bitcoin/testnet3/debug.log | grep WriteBatch 2018-03-05 19:34:55 WriteBatch memory usage: db=chainstate, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:17 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:17 WriteBatch memory usage: db=chainstate, before=0.0MiB, after=8.0MiB 2018-03-05 19:35:22 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:22 WriteBatch memory usage: db=chainstate, before=8.0MiB, after=17.0MiB 2018-03-05 19:35:26 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:27 WriteBatch memory usage: db=chainstate, before=9.0MiB, after=18.0MiB 2018-03-05 19:35:40 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:41 WriteBatch memory usage: db=chainstate, before=9.0MiB, after=7.0MiB 2018-03-05 19:35:52 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:52 WriteBatch memory usage: db=chainstate, before=7.0MiB, after=9.0MiB ^C ``` As LevelDB doesn't seem to provide a way to get the database name, I've also added a new `m_name` field to the `CDBWrapper`. This is necessary because we have multiple LevelDB databases (two now, and possibly more later, e.g. bitcoin#11857). I am using this information in other branches where I'm experimenting with changing LevelDB buffer sizes. Tree-SHA512: 7ea8ff5484bb07ef806af17d000c74ccca27d2e0f6c3229e12d93818f00874553335d87428482bd8acbcae81ea35aef2a243326f9fccbfac25989323d24391b4
…te LevelDB memory use 741f017 Add DynamicMemoryUsage() to LevelDB (Evan Klitzke) Pull request description: This adds a new method `CDBWrapper::DynamicMemoryUsage()` similar to Bitcoin's existing methods of the same name. It's implemented by asking LevelDB for the information, and then parsing the string response. I've also added logging to `CDBWrapper::WriteBatch()` to track this information: ``` $ tail -f ~/.bitcoin/testnet3/debug.log | grep WriteBatch 2018-03-05 19:34:55 WriteBatch memory usage: db=chainstate, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:17 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:17 WriteBatch memory usage: db=chainstate, before=0.0MiB, after=8.0MiB 2018-03-05 19:35:22 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:22 WriteBatch memory usage: db=chainstate, before=8.0MiB, after=17.0MiB 2018-03-05 19:35:26 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:27 WriteBatch memory usage: db=chainstate, before=9.0MiB, after=18.0MiB 2018-03-05 19:35:40 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:41 WriteBatch memory usage: db=chainstate, before=9.0MiB, after=7.0MiB 2018-03-05 19:35:52 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:52 WriteBatch memory usage: db=chainstate, before=7.0MiB, after=9.0MiB ^C ``` As LevelDB doesn't seem to provide a way to get the database name, I've also added a new `m_name` field to the `CDBWrapper`. This is necessary because we have multiple LevelDB databases (two now, and possibly more later, e.g. bitcoin#11857). I am using this information in other branches where I'm experimenting with changing LevelDB buffer sizes. Tree-SHA512: 7ea8ff5484bb07ef806af17d000c74ccca27d2e0f6c3229e12d93818f00874553335d87428482bd8acbcae81ea35aef2a243326f9fccbfac25989323d24391b4
…te LevelDB memory use 741f017 Add DynamicMemoryUsage() to LevelDB (Evan Klitzke) Pull request description: This adds a new method `CDBWrapper::DynamicMemoryUsage()` similar to Bitcoin's existing methods of the same name. It's implemented by asking LevelDB for the information, and then parsing the string response. I've also added logging to `CDBWrapper::WriteBatch()` to track this information: ``` $ tail -f ~/.bitcoin/testnet3/debug.log | grep WriteBatch 2018-03-05 19:34:55 WriteBatch memory usage: db=chainstate, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:17 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:17 WriteBatch memory usage: db=chainstate, before=0.0MiB, after=8.0MiB 2018-03-05 19:35:22 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:22 WriteBatch memory usage: db=chainstate, before=8.0MiB, after=17.0MiB 2018-03-05 19:35:26 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:27 WriteBatch memory usage: db=chainstate, before=9.0MiB, after=18.0MiB 2018-03-05 19:35:40 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:41 WriteBatch memory usage: db=chainstate, before=9.0MiB, after=7.0MiB 2018-03-05 19:35:52 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:52 WriteBatch memory usage: db=chainstate, before=7.0MiB, after=9.0MiB ^C ``` As LevelDB doesn't seem to provide a way to get the database name, I've also added a new `m_name` field to the `CDBWrapper`. This is necessary because we have multiple LevelDB databases (two now, and possibly more later, e.g. bitcoin#11857). I am using this information in other branches where I'm experimenting with changing LevelDB buffer sizes. Tree-SHA512: 7ea8ff5484bb07ef806af17d000c74ccca27d2e0f6c3229e12d93818f00874553335d87428482bd8acbcae81ea35aef2a243326f9fccbfac25989323d24391b4
…te LevelDB memory use 741f017 Add DynamicMemoryUsage() to LevelDB (Evan Klitzke) Pull request description: This adds a new method `CDBWrapper::DynamicMemoryUsage()` similar to Bitcoin's existing methods of the same name. It's implemented by asking LevelDB for the information, and then parsing the string response. I've also added logging to `CDBWrapper::WriteBatch()` to track this information: ``` $ tail -f ~/.bitcoin/testnet3/debug.log | grep WriteBatch 2018-03-05 19:34:55 WriteBatch memory usage: db=chainstate, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:17 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:17 WriteBatch memory usage: db=chainstate, before=0.0MiB, after=8.0MiB 2018-03-05 19:35:22 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:22 WriteBatch memory usage: db=chainstate, before=8.0MiB, after=17.0MiB 2018-03-05 19:35:26 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:27 WriteBatch memory usage: db=chainstate, before=9.0MiB, after=18.0MiB 2018-03-05 19:35:40 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:41 WriteBatch memory usage: db=chainstate, before=9.0MiB, after=7.0MiB 2018-03-05 19:35:52 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:52 WriteBatch memory usage: db=chainstate, before=7.0MiB, after=9.0MiB ^C ``` As LevelDB doesn't seem to provide a way to get the database name, I've also added a new `m_name` field to the `CDBWrapper`. This is necessary because we have multiple LevelDB databases (two now, and possibly more later, e.g. bitcoin#11857). I am using this information in other branches where I'm experimenting with changing LevelDB buffer sizes. Tree-SHA512: 7ea8ff5484bb07ef806af17d000c74ccca27d2e0f6c3229e12d93818f00874553335d87428482bd8acbcae81ea35aef2a243326f9fccbfac25989323d24391b4
9b27047 [doc] Include txindex changes in the release notes. (Jim Posen) ed77dd6 [test] Simple unit test for TxIndex. (Jim Posen) 6d772a3 [rpc] Public interfaces to GetTransaction block until synced. (Jim Posen) a03f804 [index] Move disk IO logic from GetTransaction to TxIndex::FindTx. (Jim Posen) e0a3b80 [validation] Replace tx index code in validation code with TxIndex. (Jim Posen) 8181db8 [init] Initialize and start TxIndex in init code. (Jim Posen) f90c3a6 [index] TxIndex method to wait until caught up. (Jim Posen) 70d510d [index] Allow TxIndex sync thread to be interrupted. (Jim Posen) 94b4f8b [index] TxIndex initial sync thread. (Jim Posen) 34d68bf [index] Create new TxIndex class. (Jim Posen) c88bcec [db] Migration for txindex data to new, separate database. (Jim Posen) 0cb8303 [db] Create separate database for txindex. (Jim Posen) Pull request description: I'm re-opening bitcoin#11857 as a new pull request because the last one stopped loading for people ------------------------------- This refactors the tx index code to be in it's own class and get built concurrently with validation code. The main benefit is decoupling and moving the txindex into a separate DB. The primary motivation is to lay the groundwork for other indexers that might be desired (such as the [compact filters](bitcoin/bips#636)). The basic idea is that the TxIndex spins up its own thread, which first syncs the txindex to the current block index, then once in sync the BlockConnected ValidationInterface hook writes new blocks. ### DB changes At the suggestion of some other developers, the txindex has been split out into a separate database. A data migration runs at startup on any nodes with a legacy txindex. Currently the migration blocks node initialization until complete. ### Open questions - Should the migration of txindex data from the old DB to the new DB block in init or should it happen in a background thread? The downside to backgrounding it is that `getrawtransaction` would return an error message saying the txindex is syncing while the migration is running. ### Impact In a sample size n=1 test where I synced nodes from scratch, the average time [Index writing](https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L1903) was 3.36ms in master and 1.72ms in this branch. The average time between `UpdateTip` log lines for sequential blocks between 400,000 and IBD end on mainnet was 0.297204s in master and 0.286134s in this branch. Most likely this is just variance in IBD times, but I can try with some more trials if people want. Tree-SHA512: 451fd7d95df89dfafceaa723cdf0f7b137615b531cf5c5035cfb54e9ccc2026cec5ac85edbcf71b7f4e2f102e36e9202b8b3a667e1504a9e1a9976ab1f0079c4
9cb6cdc Simplify semantics of ChainStateFlushed callback (Matt Corallo) 50b6533 scripted-diff: Rename SetBestChain callback ChainStateFlushed (Matt Corallo) Pull request description: Previously, ChainStateFlushed would fire either if a full flush completed (which can happen due to memory limits, forced flush, or on its own DATABASE_WRITE_INTERVAL timer) *or* on a ChainStateFlushed-specific DATABASE_WRITE_INTERVAL timer. This is both less clear for clients (as there are no guarantees about a flush having actually happened prior to the call), and reults in extra flushes not clearly intended by the code. We drop the second case, providing a strong guarantee without removing the periodit timer-based flushing. This is a follow-up to discussion in bitcoin#11857. Tree-SHA512: 22ba3a0954d265d28413dbf87040790ca5b439820ee7bbadab14028295ec190de82ce5cd664426c82e58b706dc84278868026fa8d066702eb6e6962c9ace1f8e
9b27047 [doc] Include txindex changes in the release notes. (Jim Posen) ed77dd6 [test] Simple unit test for TxIndex. (Jim Posen) 6d772a3 [rpc] Public interfaces to GetTransaction block until synced. (Jim Posen) a03f804 [index] Move disk IO logic from GetTransaction to TxIndex::FindTx. (Jim Posen) e0a3b80 [validation] Replace tx index code in validation code with TxIndex. (Jim Posen) 8181db8 [init] Initialize and start TxIndex in init code. (Jim Posen) f90c3a6 [index] TxIndex method to wait until caught up. (Jim Posen) 70d510d [index] Allow TxIndex sync thread to be interrupted. (Jim Posen) 94b4f8b [index] TxIndex initial sync thread. (Jim Posen) 34d68bf [index] Create new TxIndex class. (Jim Posen) c88bcec [db] Migration for txindex data to new, separate database. (Jim Posen) 0cb8303 [db] Create separate database for txindex. (Jim Posen) Pull request description: I'm re-opening bitcoin#11857 as a new pull request because the last one stopped loading for people ------------------------------- This refactors the tx index code to be in it's own class and get built concurrently with validation code. The main benefit is decoupling and moving the txindex into a separate DB. The primary motivation is to lay the groundwork for other indexers that might be desired (such as the [compact filters](bitcoin/bips#636)). The basic idea is that the TxIndex spins up its own thread, which first syncs the txindex to the current block index, then once in sync the BlockConnected ValidationInterface hook writes new blocks. ### DB changes At the suggestion of some other developers, the txindex has been split out into a separate database. A data migration runs at startup on any nodes with a legacy txindex. Currently the migration blocks node initialization until complete. ### Open questions - Should the migration of txindex data from the old DB to the new DB block in init or should it happen in a background thread? The downside to backgrounding it is that `getrawtransaction` would return an error message saying the txindex is syncing while the migration is running. ### Impact In a sample size n=1 test where I synced nodes from scratch, the average time [Index writing](https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L1903) was 3.36ms in master and 1.72ms in this branch. The average time between `UpdateTip` log lines for sequential blocks between 400,000 and IBD end on mainnet was 0.297204s in master and 0.286134s in this branch. Most likely this is just variance in IBD times, but I can try with some more trials if people want. Tree-SHA512: 451fd7d95df89dfafceaa723cdf0f7b137615b531cf5c5035cfb54e9ccc2026cec5ac85edbcf71b7f4e2f102e36e9202b8b3a667e1504a9e1a9976ab1f0079c4
9b27047 [doc] Include txindex changes in the release notes. (Jim Posen) ed77dd6 [test] Simple unit test for TxIndex. (Jim Posen) 6d772a3 [rpc] Public interfaces to GetTransaction block until synced. (Jim Posen) a03f804 [index] Move disk IO logic from GetTransaction to TxIndex::FindTx. (Jim Posen) e0a3b80 [validation] Replace tx index code in validation code with TxIndex. (Jim Posen) 8181db8 [init] Initialize and start TxIndex in init code. (Jim Posen) f90c3a6 [index] TxIndex method to wait until caught up. (Jim Posen) 70d510d [index] Allow TxIndex sync thread to be interrupted. (Jim Posen) 94b4f8b [index] TxIndex initial sync thread. (Jim Posen) 34d68bf [index] Create new TxIndex class. (Jim Posen) c88bcec [db] Migration for txindex data to new, separate database. (Jim Posen) 0cb8303 [db] Create separate database for txindex. (Jim Posen) Pull request description: I'm re-opening bitcoin#11857 as a new pull request because the last one stopped loading for people ------------------------------- This refactors the tx index code to be in it's own class and get built concurrently with validation code. The main benefit is decoupling and moving the txindex into a separate DB. The primary motivation is to lay the groundwork for other indexers that might be desired (such as the [compact filters](bitcoin/bips#636)). The basic idea is that the TxIndex spins up its own thread, which first syncs the txindex to the current block index, then once in sync the BlockConnected ValidationInterface hook writes new blocks. ### DB changes At the suggestion of some other developers, the txindex has been split out into a separate database. A data migration runs at startup on any nodes with a legacy txindex. Currently the migration blocks node initialization until complete. ### Open questions - Should the migration of txindex data from the old DB to the new DB block in init or should it happen in a background thread? The downside to backgrounding it is that `getrawtransaction` would return an error message saying the txindex is syncing while the migration is running. ### Impact In a sample size n=1 test where I synced nodes from scratch, the average time [Index writing](https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L1903) was 3.36ms in master and 1.72ms in this branch. The average time between `UpdateTip` log lines for sequential blocks between 400,000 and IBD end on mainnet was 0.297204s in master and 0.286134s in this branch. Most likely this is just variance in IBD times, but I can try with some more trials if people want. Tree-SHA512: 451fd7d95df89dfafceaa723cdf0f7b137615b531cf5c5035cfb54e9ccc2026cec5ac85edbcf71b7f4e2f102e36e9202b8b3a667e1504a9e1a9976ab1f0079c4
9b27047 [doc] Include txindex changes in the release notes. (Jim Posen) ed77dd6 [test] Simple unit test for TxIndex. (Jim Posen) 6d772a3 [rpc] Public interfaces to GetTransaction block until synced. (Jim Posen) a03f804 [index] Move disk IO logic from GetTransaction to TxIndex::FindTx. (Jim Posen) e0a3b80 [validation] Replace tx index code in validation code with TxIndex. (Jim Posen) 8181db8 [init] Initialize and start TxIndex in init code. (Jim Posen) f90c3a6 [index] TxIndex method to wait until caught up. (Jim Posen) 70d510d [index] Allow TxIndex sync thread to be interrupted. (Jim Posen) 94b4f8b [index] TxIndex initial sync thread. (Jim Posen) 34d68bf [index] Create new TxIndex class. (Jim Posen) c88bcec [db] Migration for txindex data to new, separate database. (Jim Posen) 0cb8303 [db] Create separate database for txindex. (Jim Posen) Pull request description: I'm re-opening bitcoin#11857 as a new pull request because the last one stopped loading for people ------------------------------- This refactors the tx index code to be in it's own class and get built concurrently with validation code. The main benefit is decoupling and moving the txindex into a separate DB. The primary motivation is to lay the groundwork for other indexers that might be desired (such as the [compact filters](bitcoin/bips#636)). The basic idea is that the TxIndex spins up its own thread, which first syncs the txindex to the current block index, then once in sync the BlockConnected ValidationInterface hook writes new blocks. ### DB changes At the suggestion of some other developers, the txindex has been split out into a separate database. A data migration runs at startup on any nodes with a legacy txindex. Currently the migration blocks node initialization until complete. ### Open questions - Should the migration of txindex data from the old DB to the new DB block in init or should it happen in a background thread? The downside to backgrounding it is that `getrawtransaction` would return an error message saying the txindex is syncing while the migration is running. ### Impact In a sample size n=1 test where I synced nodes from scratch, the average time [Index writing](https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L1903) was 3.36ms in master and 1.72ms in this branch. The average time between `UpdateTip` log lines for sequential blocks between 400,000 and IBD end on mainnet was 0.297204s in master and 0.286134s in this branch. Most likely this is just variance in IBD times, but I can try with some more trials if people want. Tree-SHA512: 451fd7d95df89dfafceaa723cdf0f7b137615b531cf5c5035cfb54e9ccc2026cec5ac85edbcf71b7f4e2f102e36e9202b8b3a667e1504a9e1a9976ab1f0079c4
9cb6cdc Simplify semantics of ChainStateFlushed callback (Matt Corallo) 50b6533 scripted-diff: Rename SetBestChain callback ChainStateFlushed (Matt Corallo) Pull request description: Previously, ChainStateFlushed would fire either if a full flush completed (which can happen due to memory limits, forced flush, or on its own DATABASE_WRITE_INTERVAL timer) *or* on a ChainStateFlushed-specific DATABASE_WRITE_INTERVAL timer. This is both less clear for clients (as there are no guarantees about a flush having actually happened prior to the call), and reults in extra flushes not clearly intended by the code. We drop the second case, providing a strong guarantee without removing the periodit timer-based flushing. This is a follow-up to discussion in bitcoin#11857. Tree-SHA512: 22ba3a0954d265d28413dbf87040790ca5b439820ee7bbadab14028295ec190de82ce5cd664426c82e58b706dc84278868026fa8d066702eb6e6962c9ace1f8e
9cb6cdc Simplify semantics of ChainStateFlushed callback (Matt Corallo) 50b6533 scripted-diff: Rename SetBestChain callback ChainStateFlushed (Matt Corallo) Pull request description: Previously, ChainStateFlushed would fire either if a full flush completed (which can happen due to memory limits, forced flush, or on its own DATABASE_WRITE_INTERVAL timer) *or* on a ChainStateFlushed-specific DATABASE_WRITE_INTERVAL timer. This is both less clear for clients (as there are no guarantees about a flush having actually happened prior to the call), and reults in extra flushes not clearly intended by the code. We drop the second case, providing a strong guarantee without removing the periodit timer-based flushing. This is a follow-up to discussion in bitcoin#11857. Tree-SHA512: 22ba3a0954d265d28413dbf87040790ca5b439820ee7bbadab14028295ec190de82ce5cd664426c82e58b706dc84278868026fa8d066702eb6e6962c9ace1f8e
…te LevelDB memory use 741f017 Add DynamicMemoryUsage() to LevelDB (Evan Klitzke) Pull request description: This adds a new method `CDBWrapper::DynamicMemoryUsage()` similar to Bitcoin's existing methods of the same name. It's implemented by asking LevelDB for the information, and then parsing the string response. I've also added logging to `CDBWrapper::WriteBatch()` to track this information: ``` $ tail -f ~/.bitcoin/testnet3/debug.log | grep WriteBatch 2018-03-05 19:34:55 WriteBatch memory usage: db=chainstate, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:17 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:17 WriteBatch memory usage: db=chainstate, before=0.0MiB, after=8.0MiB 2018-03-05 19:35:22 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:22 WriteBatch memory usage: db=chainstate, before=8.0MiB, after=17.0MiB 2018-03-05 19:35:26 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:27 WriteBatch memory usage: db=chainstate, before=9.0MiB, after=18.0MiB 2018-03-05 19:35:40 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:41 WriteBatch memory usage: db=chainstate, before=9.0MiB, after=7.0MiB 2018-03-05 19:35:52 WriteBatch memory usage: db=index, before=0.0MiB, after=0.0MiB 2018-03-05 19:35:52 WriteBatch memory usage: db=chainstate, before=7.0MiB, after=9.0MiB ^C ``` As LevelDB doesn't seem to provide a way to get the database name, I've also added a new `m_name` field to the `CDBWrapper`. This is necessary because we have multiple LevelDB databases (two now, and possibly more later, e.g. bitcoin#11857). I am using this information in other branches where I'm experimenting with changing LevelDB buffer sizes. Tree-SHA512: 7ea8ff5484bb07ef806af17d000c74ccca27d2e0f6c3229e12d93818f00874553335d87428482bd8acbcae81ea35aef2a243326f9fccbfac25989323d24391b4
This refactors the tx index code to be in it's own class and get built concurrently with validation code. The main benefit is decoupling and moving the txindex into a separate DB. The primary motivation is to lay the groundwork for other indexers that might be desired (such as the compact filters). The basic idea is that the TxIndex spins up its own thread, which first syncs the txindex to the current block index, then once in sync the BlockConnected ValidationInterface hook writes new blocks.
DB changes
At the suggestion of some other developers, the txindex has been split out into a separate database. A data migration runs at startup on any nodes with a legacy txindex. Currently the migration blocks node initialization until complete.
Open questions
getrawtransaction
would return an error message saying the txindex is syncing while the migration is running.Impact
In a sample size n=1 test where I synced nodes from scratch, the average time Index writing was 3.36ms in master and 1.72ms in this branch. The average time between
UpdateTip
log lines for sequential blocks between 400,000 and IBD end on mainnet was 0.297204s in master and 0.286134s in this branch. Most likely this is just variance in IBD times, but I can try with some more trials if people want.