-
Notifications
You must be signed in to change notification settings - Fork 37.7k
test: silence TSAN false positive in coinstatsindex_initial_sync #26188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: silence TSAN false positive in coinstatsindex_initial_sync #26188
Conversation
A call to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK. This may fix the bug, but it could be better.
Hmm, I would think the opposite Stopping the scheduler seems more indirect and more error prone because it make additional assumptions that validationinterface uses the scheduler internally and that it is safe to stop the scheduler at all because nothing else depends on it. Also agree with Marco's suggestion that it would be good to stick to consistent approach across tests, which also favors |
eaec74f
to
40ef8e0
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -79,6 +79,10 @@ BOOST_FIXTURE_TEST_CASE(coinstatsindex_initial_sync, TestChain100Setup) | |||
// Shutdown sequence (c.f. Shutdown() in init.cpp) | |||
coin_stats_index.Stop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated: I think the two Stop()
, here and in fa176e2 can be removed, as they are done by the destructor anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like that all calls to Stop()
are done just before the objects are destroyed (except the call from the destructor itself). Thus Stop()
can be removed and its (short) body moved inside BaseIndex::~BaseIndex()
. If this is the case then this deserves a separate PR. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calls to Stop need to happen before the BaseIndex destructor runs because they are responsible for joining the sync thread, which can call virtual methods. By the time the BaseIndex destructor runs, whatever subclass inherited from BaseIndex is partially destroyed and the virtual method calls won't work.
The call to Stop() in the destructor was never valid and I have a commit c499c87 in #24230 which removes it and asserts that it was called previously.
40ef8e0
to
3ae1fe4
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/test/coinstatsindex_tests.cpp
Outdated
// Make sure coin_stats_index is not used by the scheduler thread after it | ||
// has been destroyed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "test: fix race in shutdown code in coinstatsindex_initial_sync" (3ae1fe4)
I think it would be better to move the SyncWithValidationInterfaceQueue()
call up one line before the Stop()
call, and not an introduce an artificial pause between stopping the index and destroying the index. There should be no need for an extra pause between stopping and destroying. Adding the pause in the wrong place in the test could potentially mask real bugs, so better to add it in the right place one line up.
I also think the code comment "Make sure coin_stats_index is not used by the scheduler thread" and PR description "fix race in shutdown code" are a little misleading, because this is not really fixing a race condition. This is working around TSAN false positive errors that happen because TSAN doesn't understand our BaseIndex::BlockUntilSyncedToCurrentChain
and BaseIndex::BlockConnected
functions.
TSAN alerts about a race condition because it doesn't see the test thread which the destroys the object explicitly waiting for the scheduler thread which calls BlockConnected
. But this is because, as an optimization, our BlockUntilSyncedToCurrentChain
function skips calling SyncWithValidationInterfaceQueue
when m_best_block_index
matches the chain tip. This is a valid optimization as long as the BaseIndex::BlockConnected
implementation does not do any reads or writes to *this
after it updates m_best_block_index
. This was the case before f08c9fb from #21726 did add a real race bug by making AllowPrune()
and GetName()
calls after updating m_best_block_index
. I think a fix for this could look like:
--- a/src/index/base.cpp
+++ b/src/index/base.cpp
@@ -414,10 +414,18 @@ IndexSummary BaseIndex::GetSummary() const
void BaseIndex::SetBestBlockIndex(const CBlockIndex* block) {
assert(!node::fPruneMode || AllowPrune());
- m_best_block_index = block;
if (AllowPrune() && block) {
node::PruneLockInfo prune_lock;
prune_lock.height_first = block->nHeight;
WITH_LOCK(::cs_main, m_chainstate->m_blockman.UpdatePruneLock(GetName(), prune_lock));
}
+ // For thread safety and compatibility with m_best_block_index optimization
+ // in the BlockUntilSyncedToCurrentChain(), update m_best_block_index as
+ // the last step in this function, and avoid updating state or accessing
+ // *this after m_best_block_index is set. If state is updated after
+ // m_best_block_index, external threads won't be able to rely on
+ // BlockUntilSyncedToCurrentChain() to be in sync with that state. And if
+ // *this is accessed, external threads won't be able to use
+ // BlockUntilSyncedToCurrentChain() to safely destroy the index object.
+ m_best_block_index = block;
}
For this commit, I think a more accurate comment explaining the SyncWithValidationInterfaceQueue();
call would say something like "To avoid false positive TSAN errors, explicitly signal the test thread from the validationinterface thread so TSAN does not think there is a race between index shutdown code and BlockConnected notification code. The BlockUntilSyncedToCurrentChain() call above is sufficient to avoid these races in reality, but TSAN can't detect this due to an optimization in that function."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is working around TSAN false positive errors
No, this is a real issue (use-after-free), see steps to reproduce without tsan (for example with valgrind): #25365 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is working around TSAN false positive errors
No, this is a real issue (use-after-free), see steps to reproduce without tsan (for example with valgrind): #25365 (comment)
Yes I did see that comment, but it was not reproducing an existing race condition, it was adding a new race condition by calling this->GetName()
after the m_best_block_index = block
assignment. If the m_best_block_index = block;
assignment was done last, there would be no race.
My comment above may be a little muddled because I noticed the real race condition caused by f08c9fb from #21726 in the middle of writing it. I think we should fix the real race condition in non-test code making a change to SetBestBlockIndex
like the one I suggested above. After we do that, the the current change 3ae1fe4 in this PR will just be silencing TSAN errors, not avoiding a real race condition anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was not reproducing an existing race condition, it was adding a new race condition by calling this->GetName() after the m_best_block_index = block assignment
Ok, but see also #25365 (comment). I could reproduce use-after-free in Valgrind even without adding a call to GetName()
, just a sleep suffices.
I got the impression that when BaseIndex::SetBestBlockIndex()
executes this
is already freed, but I do not remember if I checked to access a member before the m_best_block_index
assignment.
Moving the assignment to the bottom may fix it. Anyway, this whole story looks overly complicated to me. It shouldn't require multiple people discussing whether this is a bug or not and how to fix it exactly! I opened #26210 to explore an alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but see also #25365 (comment). I could reproduce use-after-free in Valgrind even without adding a call to
GetName()
, just a sleep suffices.
Right. Your reproduction puts the sleep right before the UpdatePruneLock(GetName(), ...)
call, so it does not need to add a new call to GetName()
and does reveal an existing race condition in the test. Marco's reproduction puts that sleep after that and adds another GetName()
call, so it is really adding a new race condition.
I got the impression that when
BaseIndex::SetBestBlockIndex()
executesthis
is already freed, but I do not remember if I checked to access a member before them_best_block_index
assignment.
I don't think this is possible because BlockUntilSyncedToCurrentChain
would not have returned before the m_best_block_index
assignment was done.
Moving the assignment to the bottom may fix it. Anyway, this whole story looks overly complicated to me. It shouldn't require multiple people discussing whether this is a bug or not and how to fix it exactly! I opened #26210 to explore an alternative.
Yes I agree moving the assignment to the bottom is not the best long term solution for races generally. I just think moving the assignment is the most direct fix for the race condition introduced in #21726. Also I think it makes code more logical and seems like a reliability improvement. I opened a new PR #26215 which moves the assignment, and I think it would be the way to address the real race condition, and turn the TSAN true positives into false positives. After that, this PR would be a good way to fix remaining false positives (again with the suggestion that I think it would be better for test shutdown sequence to be 1-flush 2-stop 3-destroy not 1-stop 2-flush 3-destroy so test behavior is closer to bitcoind shutdown behavior).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I this is a false positive, I wonder what the steps are to reproduce. After all tsan should be deterministic, assuming manual sleeps are added at the right places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I this is a false positive, I wonder what the steps are to reproduce. After all tsan should be deterministic, assuming manual sleeps are added at the right places?
If the goal is to reproduce TSAN false positive errors that remain after #26215 fixes the race condition introduced by #21726, maybe a good way would be to put a 2 second sleep at the end of SetBestBlockIndex
after it assigns m_best_block_index
and 1 second sleep in the test before it calls BlockUntilSyncedToCurrentChain
. This way the TSAN should see the scheduler thread reading the vtable ptr, and the test thread writing it, and no apparent synchronization between the two threads because BlockUntilSyncedToCurrentChain
will return early without ever calling SyncWithValidationInterfaceQueue
Since commit f08c9fb from PR bitcoin#21726, the `BlockUntilSyncedToCurrentChain` behavior has been less reliable and also introduced a race condition in the coinstatsindex_initial_sync unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: Vasil Dimov <vd@FreeBSD.org> Co-authored-by: MacroFake <falke.marco@gmail.com>
3ae1fe4
to
d2395c8
Compare
I think this should be good to go, unrelated to other PRs, as a minimal change that fixes the problem, to be included in 24.x. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK c456660. But it would be better if #26215 were merged before this PR, because unlike this PR, #26215 is a direct fix for the race condition introduced in #21726. After #26215, this PR would be easier to understand as a followup that addresses TSAN false positive errors and makes test shutdown sequences consistent.
I don't think it would be good to merge this PR before #26215, because it would only be fixing the race condition indirectly instead of directly, and it would be adding misleading comments about the purpose of the SyncWithValidationInterfaceQueue call following the BlockUntilSyncedToCurrentChain call. The new SyncWithValidationInterfaceQueue call shouldn't be necessary for anything other than making TSAN happy.
src/test/coinstatsindex_tests.cpp
Outdated
// Make sure coin_stats_index is not used by the scheduler thread after it | ||
// has been destroyed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I this is a false positive, I wonder what the steps are to reproduce. After all tsan should be deterministic, assuming manual sleeps are added at the right places?
If the goal is to reproduce TSAN false positive errors that remain after #26215 fixes the race condition introduced by #21726, maybe a good way would be to put a 2 second sleep at the end of SetBestBlockIndex
after it assigns m_best_block_index
and 1 second sleep in the test before it calls BlockUntilSyncedToCurrentChain
. This way the TSAN should see the scheduler thread reading the vtable ptr, and the test thread writing it, and no apparent synchronization between the two threads because BlockUntilSyncedToCurrentChain
will return early without ever calling SyncWithValidationInterfaceQueue
Since commit f08c9fb from PR bitcoin#21726, index `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has also been a race condition in the `coinstatsindex_initial_sync` unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: Vasil Dimov <vd@FreeBSD.org> Co-authored-by: MacroFake <falke.marco@gmail.com>
…edToCurrentChain reliability 8891949 index: Improve BaseIndex::BlockUntilSyncedToCurrentChain reliability (Ryan Ofsky) Pull request description: Since commit f08c9fb from PR bitcoin/bitcoin#21726, index `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has also been a race condition in the `coinstatsindex_initial_sync` unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin/bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin/bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin/bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: vasild Co-authored-by: MarcoFalke ACKs for top commit: mzumsande: re-ACK 8891949 Tree-SHA512: 52e29e3772a0c92873c54e5ffb31dd66a909b68a2031b7585713cd1d976811289c98bd9bb41679a8689062f03be4f97bb8368696e789caa4607c2fd8b1fe289b
#26215 has been merged. @vasild can you rebase/follow up with @ryanofsky's recent comments here. |
c456660
to
26bef29
Compare
Note - I cannot reproduce this with the steps from #26188 (comment). |
Update title/OP? |
Done, sorry. |
I can't reproduce this either |
…entChain reliability 8891949 index: Improve BaseIndex::BlockUntilSyncedToCurrentChain reliability (Ryan Ofsky) Pull request description: Since commit f08c9fb from PR bitcoin#21726, index `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has also been a race condition in the `coinstatsindex_initial_sync` unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: vasild Co-authored-by: MarcoFalke ACKs for top commit: mzumsande: re-ACK 8891949 Tree-SHA512: 52e29e3772a0c92873c54e5ffb31dd66a909b68a2031b7585713cd1d976811289c98bd9bb41679a8689062f03be4f97bb8368696e789caa4607c2fd8b1fe289b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 26bef29. I do think this change improves the tests, even if we can't reproduce the original TSAN errors anymore. The PR fixes a misleading comment in the txindex_initial_sync
test, and makes shutdown order in that test more closely match bitcoind shutdown order. It also makes the coinstatsindex_initial_sync
test consistent with the txindex_initial_sync
test.
If we are sure there are no TSAN false positive errors here, a potentially simpler alternative to this PR is to remove the SyncWithValidationInterfaceQueue
calls and comments from both tests. It's possible TSAN is taking atomic reads and writes into account and can infer that the test thread is waiting for the notification thread because it is checking the atomic m_best_block_index variable. That could explain why sleeps suggested in #26188 (comment) don't cause errors, even if the sleeps do cause BlockUntilSyncedToCurrentChain
to return early without calling SyncWithValidationInterfaceQueue
.
Since commit f08c9fb from PR bitcoin#21726, index `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has also been a race condition in the `coinstatsindex_initial_sync` unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: Vasil Dimov <vd@FreeBSD.org> Co-authored-by: MacroFake <falke.marco@gmail.com> Github-Pull: bitcoin#26215 Rebased-From: 8891949
…x_tests So that the call order is the same as in coinstatsindex_tests.
26bef29
to
861cb3f
Compare
I am also ok with this, since it is a test-only and a false positive, to leave it "unfixed" until it happens again in CI, if ever. @MarcoFalke, what do you think? |
review ACK 861cb3f I think anything is fine here. Either keep both or delete both. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since commit f08c9fb from PR bitcoin#21726, index `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has also been a race condition in the `coinstatsindex_initial_sync` unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: Vasil Dimov <vd@FreeBSD.org> Co-authored-by: MacroFake <falke.marco@gmail.com>
…dex_initial_sync 861cb3f test: move SyncWithValidationInterfaceQueue() before Stop() in txindex_tests (Vasil Dimov) 6526dc3 test: silence TSAN false positive in coinstatsindex_initial_sync (Vasil Dimov) Pull request description: Silence false positives from TSAN about unsynchronized calls to `BaseIndex::~BaseIndex()` and `BaseIndex::SetBestBlockIndex()`. They are synchronized, but beyond the comprehension of TSAN - by `SyncWithValidationInterfaceQueue()`, called from `BaseIndex::BlockUntilSyncedToCurrentChain()`. Fixes bitcoin#25365 ACKs for top commit: MarcoFalke: review ACK 861cb3f ryanofsky: Code review ACK 861cb3f. Just comment change since last review. Tree-SHA512: 8c30fdf2fd11d54e9adfa68a67185ab820bd7bd9f7f3ad6456e7e6d219fa9cf6d34b41e98e723eae86cb0c1baef7f3fc57b1b011a13dc3fe3d78334b9b5596de
Since commit f08c9fb0c6a799e3cb75ca5f763a746471625beb from PR bitcoin/bitcoin#21726, index `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has also been a race condition in the `coinstatsindex_initial_sync` unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb0c6a799e3cb75ca5f763a746471625beb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin/bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin/bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin/bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: Vasil Dimov <vd@FreeBSD.org> Co-authored-by: MacroFake <falke.marco@gmail.com>
* doc: add info about status code 404 for some rest endpoints * refactor: remove duplicate code from BlockAssembler * build: Fix `capnp` package build for Android * build: Specify native binaries explicitly when building `capnp` package From `configure --help`: --with-external-capnp use the system capnp binary (or the one specified with $CAPNP) instead of compiling a new one (useful for cross-compiling) * net: remove useless call to IsReachable() from CConnman::Bind() `CConnman::Bind()` is called without `BF_EXPLICIT` only when passed either `0.0.0.0` or `::`. For those addresses `IsReachable()` is always true (regardless of the `-onlynet=` setting!), meaning that the `if` condition never evaluates to true. `IsReachable()` is always true for the "any" IPv4 and IPv6 addresses because `CNetAddr::GetNetwork()` returns `NET_UNROUTABLE` instead of `NET_IPV4` or `NET_IPV6` and the network `NET_UNROUTABLE` is always considered reachable. It follows that `BF_EXPLICIT` is unnecessary, remove it too. * sync: simplify MaybeCheckNotHeld() definitions by using a template Reduce 4 of the `MaybeCheckNotHeld()` definitions to 2 by using a template. This also makes the function usable for other [BasicLockable](https://en.cppreference.com/w/cpp/named_req/BasicLockable) types. * sync: remove unused template parameter from ::UniqueLock The template parameter `typename Base = typename Mutex::UniqueLock` is not used, so remove it. Use internally defined type `Base` to avoid repetitions of `Mutex::UniqueLock`. * scripted-diff: Rename time symbols -BEGIN VERIFY SCRIPT- ren() { sed -i "s:\<$1\>:$2:g" $(git grep -l "\<$1\>" ':(exclude)src/versionbits.cpp') ; } ren nStart time_start ren nTimeStart time_start ren nTimeReadFromDiskTotal time_read_from_disk_total ren nTimeConnectTotal time_connect_total ren nTimeFlush time_flush ren nTimeChainState time_chainstate ren nTimePostConnect time_post_connect ren nTimeCheck time_check ren nTimeForks time_forks ren nTimeConnect time_connect ren nTimeVerify time_verify ren nTimeUndo time_undo ren nTimeIndex time_index ren nTimeTotal time_total ren nTime1 time_1 ren nTime2 time_2 ren nTime3 time_3 ren nTime4 time_4 ren nTime5 time_5 ren nTime6 time_6 ren nBlocksTotal num_blocks_total # Newline after semicolon perl -0777 -pi -e 's/; time_connect_total/;\n time_connect_total/g' src/validation.cpp perl -0777 -pi -e 's/; time_/;\n time_/g' src/validation.cpp -END VERIFY SCRIPT- * Use steady clock for bench logging * doc: add historical 0.20.2 release notes * doc: add historical 0.21.2 release notes * build: split ARM crc & crypto extension checks We currently perform the same check twice, to put the same set of flags in two different variables. Split the checks so we test for crc and crypto extensions independently. If we don't want to split, we should just delete the second AX_CHECK_COMPILE_FLAG check, and set ARM_CRC_CXXFLAGS & ARM_CRC_CXXFLAGS at the same time. * refactor: Do not discard `try_lock()` return value Microsoft's C++ Standard Library uses the `[[nodiscard]]` attribute for `try_lock()`. See: https://github.com/microsoft/STL/blob/main/stl/inc/mutex This change allows to drop the current suppression for the warning C4838 and helps to prevent the upcoming warning C4858. See: microsoft/STL@539c26c * signet/miner: reduce default interblock interval limit to 30min Also allow the operator to change it, if desired, without having to edit the code. * Squashed 'src/leveldb/' changes from 22f1e4a02f..e2f10b4e47 e2f10b4e47 Merge bitcoin-core/leveldb-subtree#34: win32: fix -Wmissing-field-initializers warnings 12c52b392d win32: fix -Wmissing-field-initializers warnings git-subtree-dir: src/leveldb git-subtree-split: e2f10b4e47bc950a81bc96d1c6db3a8048216642 * ci: Bump vcpkg to the latest version `2022.09.27` Dependency changes (2022.06.16.1 - 2022.09.27): - boost 1.79.0#0 -> 1.80.0#0 - sqlite3 3.37.2#1 -> 3.39.2#0 - zeromq 4.3.4#5 -> 4.3.4#6 * contrib: Fix capture_output in getcoins.py Our required Python version 3.6.12 does not support `capture_output` as a subprocess.run argument; this was added in python 3.7. We can emulate it by setting stdout and stderr to subprocess.PIPE * fuzz: Limit outpoints.size in txorphan target to avoid OOM * refactor: move Boost datetime usage to wallet This means we don't need datetime in a --disable-wallet build, and it isn't included in the kernel. * wallet: Use correct effective value when checking target * test: Check external coin effective value is used in CoinSelection * test: Use proper Boost macros instead of assertions * ci: Run `bench_bitcoin.exe --sanity-check` in "Win64 native" task Also a better name used for the script as it follows GNU's `make check`. * doc: bump bips.md up-to-date version to v24.0 This is a trivial follow-up to bitcoin#26124. * refactor: Drop `owns_lock()` call Co-authored-by: Vasil Dimov <vd@FreeBSD.org> * refactor: move DEFAULT_TXINDEX from validation to txindex * refactor: move DEFAULT_COINSTATSINDEX from validation to coinstatsindex * refactor: move DEFAULT_BLOCKFILTERINDEX from val to blockfilterindex * kernel: remove util/bytevectorhash.cpp * ci: Move `git config` commands into script where they are used * ci: Use same `merge_script` implementation for Windows as for all * ci: Remove unused package Address feedback from https://github.com/bitcoin/bitcoin/pull/24561/files#r985719812 * ci: Allow PIP_PACKAGES on centos This was added in 7fc5e86 but I can't see a reason why this should be forbidden. * test: Remove unused fCheckpointsEnabled from miner_tests The earliest checkpoint is at height 11111, so this can't possibly have any impact on this test. * build, msvc: Enable C4834 warning https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/c4834 * kernel: move RunCommandParseJSON to its own file Because libbitcoinkernel does not include this new object, this has the side-effect of eliminating the unnecessary boost::process dependency. * ci: Workaround Windows filesystem executable bit loss * fuzz: add util/mempool/h.cpp Moving the mempool code (Boost) out of util.h, results in a ~10% speedup (for me) when compiling the fuzz tests. * fuzz: pass max fee into ConsumeTxMemPoolEntry * refactor: Make 64-bit shift explicit Also this change enables MSVC warning C4334 for all codebase. https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-3-c4334 * refactor: move run_command from util to common Quoting ryanofsky: "util can be the library for things included in the kernel which the kernel can depend on, and common can be the library for other code that needs to be shared internally, but should not be part of the kernel or shared externally." * Remove clang-format from lint task clang-format could be used in scripted diffs, but remained largely unused. * test: Pass mempool reference to AssemblerForTest * test: Use dedicated mempool in TestPrioritisedMining No need for a shared mempool. Also remove unused chainparams parameter. * test: Use dedicated mempool in TestPackageSelection No need for a shared mempool. Also remove unused chainparams parameter. * test: Use dedicated mempool in TestBasicMining No need for a shared mempool. Also remove unused chainparams parameter. Can be reviewed with --ignore-all-space * refactor: mempool: add MemPoolLimits::NoLimits() There are quite a few places in the codebase that require us to construct a CTxMemPool without limits on ancestors and descendants. This helper function allows us to get rid of all that duplication. * refactor: mempool: use CTxMempool::Limits Simplifies function signatures by removing repetition of all the ancestor/descendant limits, and increases readability by being more verbose by naming the limits, while still reducing the LoC. * test: use NoLimits() in MempoolIndexingTest The (100, 1000000, 1000, 1000000) limits are arbitrarily high and don't restrict anything, they are just meant to calculate ancestors properly. Using NoLimits() makes this intent more clear and simplifies the code. * docs: improve docs where MemPoolLimits is used * Remove unused CDataStream::rdbuf method It is unused and seems unlikely to be ever used. * index: Improve BaseIndex::BlockUntilSyncedToCurrentChain reliability Since commit f08c9fb from PR bitcoin#21726, index `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has also been a race condition in the `coinstatsindex_initial_sync` unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: Vasil Dimov <vd@FreeBSD.org> Co-authored-by: MacroFake <falke.marco@gmail.com> * test: Prevent UB in `minisketch_tests.cpp` * test: Remove confusing DUMMY_P2WPKH_SCRIPT * docs: fix m_children to be a member of CTxMemPoolEntry * wallet: have prune error take precedence over assumedvalid From Russ Yanofsky: "Agree with all of Marco's points here and think this should be updated If havePrune and hasAssumedValidChain are both true, better to show havePrune error message. Assumed-valid error message is vague and not very actionable. Would suggest "Error loading wallet. Wallet requires blocks to be downloaded, and software does not currently support loading wallets while blocks are being downloaded out of order though assumeutxo snapshots. Wallet should be able to load successfully after node sync reaches height {block_height}" Co-authored-by: MacroFake <MarcoFalke@gmail.com> Co-authored-by: Russ Yanofsky <russ@yanofsky.org> * Fix comment typos in scriptpubkeyman.cpp, wallet.cpp, wallet.h Fix comment typos: sigature -> signature ponter -> pointer it's key -> its key * sync: avoid confusing name overlap (Mutex) Use `MutexType` instead of `Mutex` for the template parameter of `UniqueLock` because there is already a class named `Mutex` and the naming overlap is confusing. `MutexType` is used elsewhere in `sync.h`. * sync: remove DebugLock alias template Use `UniqueLock` directly. Type deduction works just fine from the first argument to the constructor of `UniqueLock`, so there is no need to repeat ```cpp UniqueLock<typename std::remove_reference<typename std::remove_pointer<decltype(cs)>::type>::type> ``` five times in the `LOCK` macros. Just `UniqueLock` suffices. * sync: rename AnnotatedMixin::UniqueLock to AnnotatedMixin::unique_lock This avoids confusion with the global `UniqueLock` and the snake case is consistent with `UniqueLock::reverse_lock. * iwyu: Add zmq source files * Update .cirrus.yml Co-authored-by: brunoerg <brunoely.gc@gmail.com> Co-authored-by: James O'Beirne <james.obeirne@pm.me> Co-authored-by: Hennadii Stepanov <32963518+hebasto@users.noreply.github.com> Co-authored-by: Vasil Dimov <vd@FreeBSD.org> Co-authored-by: MacroFake <falke.marco@gmail.com> Co-authored-by: fanquake <fanquake@gmail.com> Co-authored-by: Anthony Towns <aj@erisian.com.au> Co-authored-by: willcl-ark <will8clark@gmail.com> Co-authored-by: Aurèle Oulès <aurele@oules.com> Co-authored-by: Sebastian Falbesoner <sebastian.falbesoner@gmail.com> Co-authored-by: glozow <gloriajzhao@gmail.com> Co-authored-by: Cory Fields <cory-nospam-@coryfields.com> Co-authored-by: stickies-v <stickies-v@protonmail.com> Co-authored-by: Ryan Ofsky <ryan@ofsky.org> Co-authored-by: MacroFake <MarcoFalke@gmail.com> Co-authored-by: Russ Yanofsky <russ@yanofsky.org> Co-authored-by: Dimitris Tsapakidis <dimitris@tsapakidis.com> Co-authored-by: alex v <alex@encrypt-s.com>
Silence false positives from TSAN about unsynchronized calls to
BaseIndex::~BaseIndex()
andBaseIndex::SetBestBlockIndex()
. They are synchronized, but beyond the comprehension of TSAN - bySyncWithValidationInterfaceQueue()
, called fromBaseIndex::BlockUntilSyncedToCurrentChain()
.Fixes #25365