-
Notifications
You must be signed in to change notification settings - Fork 37.7k
util: Close txindex LDB before exiting bitcoind #17852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fix a very longstanding bug wherein the txindex LevelDB was not being closed properly when stopping bitcoind, which is done by leveldb::DB::~DB(), ie., you need to call delete once for every DB* you create. This was causing the txindexing progress to have fallen behind to its last random checkpoint the next time bitcoind was started.
TxIndex::~TxIndex() {} | ||
TxIndex::~TxIndex() | ||
{ | ||
m_db.reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? The pointer should be reset when the TxIndex is destructed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, it should, (shouldn't it???) (clearly both C++ and myself are too old for me to be writing it), so it would seem that, at least in some corner case, ~TxIndex isn't getting called at all. So I will close this for now and reopen it if I figure out what's going on further investigation. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, but wouldn't really know, if the default destructor call isn't somehow ordered, or being reordered, such that it doesn't finish before the process exits, and the explicit call here maybe allows it to finish, but I'd still have to test it more and maybe figure out a way to write an integration test that verifies all the leveldbs are properly closed (or not). Will investigate further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Particularly since leveldb is rather poorly designed imo to put some very important closing the database code in the destructor, since, after all, the compiler (and programmer, and OS), don't necessarily know or care that any of those objects still in memory that are about to be wiped out by the OS when the process exits, actually do important things that need to happen before the process does exit! And thus, quite often, do not bother actually destructing any of those objects, since nobody cares, (but this is exactly why they should care), about memory that leaks when the process is about to exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It certainly seems like the patch works, but, I can neither explain why nor prove that it does definitively, so, not sure if that quite reaches the merge bar, in light of it being such a minor change to such a minor subsystem that isn't functionally broken, rather just a little bit inconvenient having to wait a while, while it reindexes the same blocks ;) but it's been bugging me (no pun intended) for years..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, was just able to verify that this patch does not in fact mysteriously fix the issue, so it remains mysterious at the moment.
What happens is, if you interrupt bitcoind in the middle of a sync with txindex on, the next time you run it, the txindexer has suddenly seemed to have fallen many blocks behind where it was when you interrupted the first process, for whatever reason that is yet to be determined, but the chainstate is always properly synced and closed so there must be a hint in comparing how the two are used differently.
So, when the txindex thread restarts it wastes a bunch of time indexing blocks it has already indexed, which you can tell because no new data is written to the leveldb until it catches back up since the keys and values in the txindex don't change. Will file a bug report if i can't find an existing one.
I haven't seen any crashes or anything worse than wasting some time and disk i/o, so it's not a real big deal in the grand scheme of the txindexer's overall time, just slows down syncing when you do restart it until the indexer catches back up, so if you're frequently stopping and starting the sync like because you have a magnetic spinning disk and it's your only disk, it can get a little irritating. But then that's always irritating 😂 thanks for the info
Whoa i dunno what just happened. Was just adding a comment but i inadvertently hit "close conversation" not as if i know what that button does but everything disappeared. Good thing i always write everything in a text editor because I'm so used to Android killing my apps and taking my drafts with them i dont like any ux that just deletes what you were working on because you hit one button. People accidentally hit buttons all the time. Anyway back on subject: Ah yes, was just able to verify that this patch does not in fact mysteriously fix the issue, so it remains mysterious at the moment. What happens is, if you interrupt bitcoind in the middle of a sync with txindex on, the next time you run it, the txindexer has suddenly seemed to have fallen many blocks behind where it was when you interrupted the first process, for whatever reason that is yet to be determined, but the chainstate is always properly synced and closed so there must be a hint in comparing how the two are used differently. So, when the txindex thread restarts it wastes a bunch of time indexing blocks it has already indexed, which you can tell because no new data is written to the leveldb until it catches back up since the keys and values in the txindex don't change. Will file a bug report if i can't find an existing one. I haven't seen any crashes or anything worse than wasting some time and disk i/o, so it's not a real big deal in the grand scheme of the txindexer's overall time, just slows down syncing when you do restart it until the indexer catches back up, so if you're frequently stopping and starting the sync like because you have a magnetic spinning disk and it's your only disk, it can get a little irritating. But then that's always irritating 😂 thanks for the info |
Txindex is a background thread, so it might fall back arbitrarily. Though it shouldn't lose progress from where it was in normal operation. |
@paulyc Oh, thank you, very nice catch. This is happening because the txindex (and the other indices) are stopped in |
Ah, @jimpo thanks for tracking that down as well. Must not have had enough coffee or something when I submitted this, but glad it helped find the issue, which clearly had to be a little more complex. ;) In the immortal words of Samwise Gamgee all's well as ends well, or at least it will be when a working PR is merged. Not sure if I'll have time to do it soon, but it's been there long enough, it can wait a little longer and I can test it properly, (why they do the tests...), unless someone else has time to fit it in somewhere but I wouldn't know |
…callback. 9dd58ca init: Stop indexes on shutdown after ChainStateFlushed callback. (Jim Posen) Pull request description: Replaces #17852. Currently, the latest index state may not be committed to disk on shutdown. The state is committed on `ChainStateFlushed` callbacks and the current init order unregisters the indexes as validation interfaces before the final `ChainStateFlushed` callback is called on them. Issue identified by paulyc. For review: an alternative or supplemental solution would be to call `Commit` at the end of `BaseIndex::Stop`. I don't see any harm in doing so and it makes the less prone to user error. However, the destructor would have to be modified to not call `Stop` because `Commit` calls a virtual method, so I figured it wasn't worth it. But I'm curious how others feel. ACKs for top commit: fjahr: tested ACK 9dd58ca paulyc: > Code review ACK [9dd58ca](9dd58ca), but failed to test because I can't reproduce the original problem. kallewoof: Tested ACK 9dd58ca promag: Code review ACK 9dd58ca, but failed to test because I can't reproduce the original problem. Tree-SHA512: 2918380b699833cb7eab07456d1667dbf8ebbe2d2b5988300a3cf5b6a6cfc818b6d9086e1936ffe7881f67e409306c4b91d61a08a169cfd0a301383479d4f3cb
…lushed callback. 9dd58ca init: Stop indexes on shutdown after ChainStateFlushed callback. (Jim Posen) Pull request description: Replaces bitcoin#17852. Currently, the latest index state may not be committed to disk on shutdown. The state is committed on `ChainStateFlushed` callbacks and the current init order unregisters the indexes as validation interfaces before the final `ChainStateFlushed` callback is called on them. Issue identified by paulyc. For review: an alternative or supplemental solution would be to call `Commit` at the end of `BaseIndex::Stop`. I don't see any harm in doing so and it makes the less prone to user error. However, the destructor would have to be modified to not call `Stop` because `Commit` calls a virtual method, so I figured it wasn't worth it. But I'm curious how others feel. ACKs for top commit: fjahr: tested ACK 9dd58ca paulyc: > Code review ACK [9dd58ca](bitcoin@9dd58ca), but failed to test because I can't reproduce the original problem. kallewoof: Tested ACK 9dd58ca promag: Code review ACK 9dd58ca, but failed to test because I can't reproduce the original problem. Tree-SHA512: 2918380b699833cb7eab07456d1667dbf8ebbe2d2b5988300a3cf5b6a6cfc818b6d9086e1936ffe7881f67e409306c4b91d61a08a169cfd0a301383479d4f3cb
…lushed callback. 9dd58ca init: Stop indexes on shutdown after ChainStateFlushed callback. (Jim Posen) Pull request description: Replaces bitcoin#17852. Currently, the latest index state may not be committed to disk on shutdown. The state is committed on `ChainStateFlushed` callbacks and the current init order unregisters the indexes as validation interfaces before the final `ChainStateFlushed` callback is called on them. Issue identified by paulyc. For review: an alternative or supplemental solution would be to call `Commit` at the end of `BaseIndex::Stop`. I don't see any harm in doing so and it makes the less prone to user error. However, the destructor would have to be modified to not call `Stop` because `Commit` calls a virtual method, so I figured it wasn't worth it. But I'm curious how others feel. ACKs for top commit: fjahr: tested ACK 9dd58ca paulyc: > Code review ACK [9dd58ca](bitcoin@9dd58ca), but failed to test because I can't reproduce the original problem. kallewoof: Tested ACK 9dd58ca promag: Code review ACK 9dd58ca, but failed to test because I can't reproduce the original problem. Tree-SHA512: 2918380b699833cb7eab07456d1667dbf8ebbe2d2b5988300a3cf5b6a6cfc818b6d9086e1936ffe7881f67e409306c4b91d61a08a169cfd0a301383479d4f3cb
Fix a very longstanding bug wherein the txindex LevelDB was not being closed properly when stopping bitcoind,
which is done by leveldb::DB::~DB(), ie., you need to call delete once for every DB* you create.
This was causing the txindexing progress to have fallen behind to its last random checkpoint the next time
bitcoind was started.