util: Close txindex LDB before exiting bitcoind #17852

paulyc · 2020-01-03T01:52:31Z

Fix a very longstanding bug wherein the txindex LevelDB was not being closed properly when stopping bitcoind,
which is done by leveldb::DB::~DB(), ie., you need to call delete once for every DB* you create.
This was causing the txindexing progress to have fallen behind to its last random checkpoint the next time
bitcoind was started.

Fix a very longstanding bug wherein the txindex LevelDB was not being closed properly when stopping bitcoind, which is done by leveldb::DB::~DB(), ie., you need to call delete once for every DB* you create. This was causing the txindexing progress to have fallen behind to its last random checkpoint the next time bitcoind was started.

maflcko · 2020-01-03T11:18:02Z

src/index/txindex.cpp

-TxIndex::~TxIndex() {}
+TxIndex::~TxIndex()
+{
+    m_db.reset();


Why is this needed? The pointer should be reset when the TxIndex is destructed

Good point, it should, (shouldn't it???) (clearly both C++ and myself are too old for me to be writing it), so it would seem that, at least in some corner case, ~TxIndex isn't getting called at all. So I will close this for now and reopen it if I figure out what's going on further investigation. Thanks

I wonder, but wouldn't really know, if the default destructor call isn't somehow ordered, or being reordered, such that it doesn't finish before the process exits, and the explicit call here maybe allows it to finish, but I'd still have to test it more and maybe figure out a way to write an integration test that verifies all the leveldbs are properly closed (or not). Will investigate further

Particularly since leveldb is rather poorly designed imo to put some very important closing the database code in the destructor, since, after all, the compiler (and programmer, and OS), don't necessarily know or care that any of those objects still in memory that are about to be wiped out by the OS when the process exits, actually do important things that need to happen before the process does exit! And thus, quite often, do not bother actually destructing any of those objects, since nobody cares, (but this is exactly why they should care), about memory that leaks when the process is about to exit.

It certainly seems like the patch works, but, I can neither explain why nor prove that it does definitively, so, not sure if that quite reaches the merge bar, in light of it being such a minor change to such a minor subsystem that isn't functionally broken, rather just a little bit inconvenient having to wait a while, while it reindexes the same blocks ;) but it's been bugging me (no pun intended) for years..

What is the issue you are seeing? Any stacktraces or anything?

Reminds me of #13894, #14071, #15410

Ah yes, was just able to verify that this patch does not in fact mysteriously fix the issue, so it remains mysterious at the moment.

What happens is, if you interrupt bitcoind in the middle of a sync with txindex on, the next time you run it, the txindexer has suddenly seemed to have fallen many blocks behind where it was when you interrupted the first process, for whatever reason that is yet to be determined, but the chainstate is always properly synced and closed so there must be a hint in comparing how the two are used differently.

So, when the txindex thread restarts it wastes a bunch of time indexing blocks it has already indexed, which you can tell because no new data is written to the leveldb until it catches back up since the keys and values in the txindex don't change. Will file a bug report if i can't find an existing one.

I haven't seen any crashes or anything worse than wasting some time and disk i/o, so it's not a real big deal in the grand scheme of the txindexer's overall time, just slows down syncing when you do restart it until the indexer catches back up, so if you're frequently stopping and starting the sync like because you have a magnetic spinning disk and it's your only disk, it can get a little irritating. But then that's always irritating 😂 thanks for the info

paulyc · 2020-01-04T14:35:28Z

Whoa i dunno what just happened. Was just adding a comment but i inadvertently hit "close conversation" not as if i know what that button does but everything disappeared. Good thing i always write everything in a text editor because I'm so used to Android killing my apps and taking my drafts with them i dont like any ux that just deletes what you were working on because you hit one button. People accidentally hit buttons all the time. Anyway back on subject:

Ah yes, was just able to verify that this patch does not in fact mysteriously fix the issue, so it remains mysterious at the moment.

What happens is, if you interrupt bitcoind in the middle of a sync with txindex on, the next time you run it, the txindexer has suddenly seemed to have fallen many blocks behind where it was when you interrupted the first process, for whatever reason that is yet to be determined, but the chainstate is always properly synced and closed so there must be a hint in comparing how the two are used differently.

So, when the txindex thread restarts it wastes a bunch of time indexing blocks it has already indexed, which you can tell because no new data is written to the leveldb until it catches back up since the keys and values in the txindex don't change. Will file a bug report if i can't find an existing one.

I haven't seen any crashes or anything worse than wasting some time and disk i/o, so it's not a real big deal in the grand scheme of the txindexer's overall time, just slows down syncing when you do restart it until the indexer catches back up, so if you're frequently stopping and starting the sync like because you have a magnetic spinning disk and it's your only disk, it can get a little irritating. But then that's always irritating 😂 thanks for the info

maflcko · 2020-01-06T21:30:54Z

What happens is, if you interrupt bitcoind in the middle of a sync with txindex on, the next time you run it, the txindexer has suddenly seemed to have fallen many blocks behind where it was when you interrupted the first process

Txindex is a background thread, so it might fall back arbitrarily. Though it shouldn't lose progress from where it was in normal operation.

jimpo · 2020-01-07T02:57:17Z

@paulyc Oh, thank you, very nice catch. This is happening because the txindex (and the other indices) are stopped in Shutdown (init.cpp) before ForceFlushStateToDisk is called. ForceFlushStateToDisk triggers the ChainStateFlushed callback on the indices which commits the indexed state to disk. So probably the best solution is to move the index stops after that. Alternately, BaseIndex::Stop can just call Commit directly. Or do both, because I don't see a downside to either approach.

paulyc · 2020-01-07T14:23:13Z

Ah, @jimpo thanks for tracking that down as well. Must not have had enough coffee or something when I submitted this, but glad it helped find the issue, which clearly had to be a little more complex. ;) In the immortal words of Samwise Gamgee all's well as ends well, or at least it will be when a working PR is merged. Not sure if I'll have time to do it soon, but it's been there long enough, it can wait a little longer and I can test it properly, (why they do the tests...), unless someone else has time to fit it in somewhere but I wouldn't know

…callback. 9dd58ca init: Stop indexes on shutdown after ChainStateFlushed callback. (Jim Posen) Pull request description: Replaces #17852. Currently, the latest index state may not be committed to disk on shutdown. The state is committed on `ChainStateFlushed` callbacks and the current init order unregisters the indexes as validation interfaces before the final `ChainStateFlushed` callback is called on them. Issue identified by paulyc. For review: an alternative or supplemental solution would be to call `Commit` at the end of `BaseIndex::Stop`. I don't see any harm in doing so and it makes the less prone to user error. However, the destructor would have to be modified to not call `Stop` because `Commit` calls a virtual method, so I figured it wasn't worth it. But I'm curious how others feel. ACKs for top commit: fjahr: tested ACK 9dd58ca paulyc: > Code review ACK [9dd58ca](9dd58ca), but failed to test because I can't reproduce the original problem. kallewoof: Tested ACK 9dd58ca promag: Code review ACK 9dd58ca, but failed to test because I can't reproduce the original problem. Tree-SHA512: 2918380b699833cb7eab07456d1667dbf8ebbe2d2b5988300a3cf5b6a6cfc818b6d9086e1936ffe7881f67e409306c4b91d61a08a169cfd0a301383479d4f3cb

…lushed callback. 9dd58ca init: Stop indexes on shutdown after ChainStateFlushed callback. (Jim Posen) Pull request description: Replaces bitcoin#17852. Currently, the latest index state may not be committed to disk on shutdown. The state is committed on `ChainStateFlushed` callbacks and the current init order unregisters the indexes as validation interfaces before the final `ChainStateFlushed` callback is called on them. Issue identified by paulyc. For review: an alternative or supplemental solution would be to call `Commit` at the end of `BaseIndex::Stop`. I don't see any harm in doing so and it makes the less prone to user error. However, the destructor would have to be modified to not call `Stop` because `Commit` calls a virtual method, so I figured it wasn't worth it. But I'm curious how others feel. ACKs for top commit: fjahr: tested ACK 9dd58ca paulyc: > Code review ACK [9dd58ca](bitcoin@9dd58ca), but failed to test because I can't reproduce the original problem. kallewoof: Tested ACK 9dd58ca promag: Code review ACK 9dd58ca, but failed to test because I can't reproduce the original problem. Tree-SHA512: 2918380b699833cb7eab07456d1667dbf8ebbe2d2b5988300a3cf5b6a6cfc818b6d9086e1936ffe7881f67e409306c4b91d61a08a169cfd0a301383479d4f3cb

fanquake added the UTXO Db and Indexes label Jan 3, 2020

fanquake requested a review from jimpo January 3, 2020 02:29

maflcko reviewed Jan 3, 2020

View reviewed changes

paulyc closed this Jan 3, 2020

jimpo mentioned this pull request Jan 8, 2020

init: Stop indexes on shutdown after ChainStateFlushed callback. #17897

Merged

bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

util: Close txindex LDB before exiting bitcoind #17852

util: Close txindex LDB before exiting bitcoind #17852

Uh oh!

paulyc commented Jan 3, 2020

Uh oh!

maflcko Jan 3, 2020

Uh oh!

paulyc Jan 3, 2020

Uh oh!

paulyc Jan 3, 2020 •

edited

Loading

Uh oh!

paulyc Jan 3, 2020 •

edited

Loading

Uh oh!

paulyc Jan 3, 2020

Uh oh!

maflcko Jan 3, 2020

Uh oh!

paulyc Jan 4, 2020

Uh oh!

paulyc commented Jan 4, 2020

Uh oh!

maflcko commented Jan 6, 2020

Uh oh!

jimpo commented Jan 7, 2020

Uh oh!

paulyc commented Jan 7, 2020 •

edited

Loading

Uh oh!

Uh oh!

util: Close txindex LDB before exiting bitcoind #17852

util: Close txindex LDB before exiting bitcoind #17852

Uh oh!

Conversation

paulyc commented Jan 3, 2020

Uh oh!

maflcko Jan 3, 2020

Choose a reason for hiding this comment

Uh oh!

paulyc Jan 3, 2020

Choose a reason for hiding this comment

Uh oh!

paulyc Jan 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paulyc Jan 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paulyc Jan 3, 2020

Choose a reason for hiding this comment

Uh oh!

maflcko Jan 3, 2020

Choose a reason for hiding this comment

Uh oh!

paulyc Jan 4, 2020

Choose a reason for hiding this comment

Uh oh!

paulyc commented Jan 4, 2020

Uh oh!

maflcko commented Jan 6, 2020

Uh oh!

jimpo commented Jan 7, 2020

Uh oh!

paulyc commented Jan 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

paulyc Jan 3, 2020 •

edited

Loading

paulyc Jan 3, 2020 •

edited

Loading

paulyc commented Jan 7, 2020 •

edited

Loading