test: silence TSAN false positive in coinstatsindex_initial_sync #26188

vasild · 2022-09-27T16:09:25Z

Silence false positives from TSAN about unsynchronized calls to BaseIndex::~BaseIndex() and BaseIndex::SetBestBlockIndex(). They are synchronized, but beyond the comprehension of TSAN - by SyncWithValidationInterfaceQueue(), called from BaseIndex::BlockUntilSyncedToCurrentChain().

Fixes #25365

vasild · 2022-09-27T16:12:45Z

A call to SyncWithValidationInterfaceQueue() also fixes it. I am +0.1 on stopping the scheduler thread and flushing the queue - looks more robust to me.

maflcko

Concept ACK. This may fix the bug, but it could be better.

src/test/coinstatsindex_tests.cpp

ryanofsky · 2022-09-28T19:03:09Z

A call to SyncWithValidationInterfaceQueue() also fixes it. I am +0.1 on stopping the scheduler thread and flushing the queue - looks more robust to me.

Hmm, I would think the opposite SyncWithValidationInterfaceQueue seems like the more robust approach to me. It forces all events that may have been posted before the index is stopped to be processed before the index destructor is called.

Stopping the scheduler seems more indirect and more error prone because it make additional assumptions that validationinterface uses the scheduler internally and that it is safe to stop the scheduler at all because nothing else depends on it.

Also agree with Marco's suggestion that it would be good to stick to consistent approach across tests, which also favors SyncWithValidationInterfaceQueue

vasild · 2022-09-29T08:22:48Z

eaec74f3b9...40ef8e064d: @ryanofsky, @MarcoFalke fair enough! Switched to SyncWithValidationInterfaceQueue() :-)

maflcko

lgtm

src/test/coinstatsindex_tests.cpp

maflcko · 2022-09-29T09:19:44Z

src/test/coinstatsindex_tests.cpp

@@ -79,6 +79,10 @@ BOOST_FIXTURE_TEST_CASE(coinstatsindex_initial_sync, TestChain100Setup)
    // Shutdown sequence (c.f. Shutdown() in init.cpp)
    coin_stats_index.Stop();


Unrelated: I think the two Stop(), here and in fa176e2 can be removed, as they are done by the destructor anyway.

It looks like that all calls to Stop() are done just before the objects are destroyed (except the call from the destructor itself). Thus Stop() can be removed and its (short) body moved inside BaseIndex::~BaseIndex(). If this is the case then this deserves a separate PR. Am I missing something?

The calls to Stop need to happen before the BaseIndex destructor runs because they are responsible for joining the sync thread, which can call virtual methods. By the time the BaseIndex destructor runs, whatever subclass inherited from BaseIndex is partially destroyed and the virtual method calls won't work.

The call to Stop() in the destructor was never valid and I have a commit c499c87 in #24230 which removes it and asserts that it was called previously.

vasild · 2022-09-29T09:53:15Z

40ef8e064d...3ae1fe4498: take suggestion to remove a comment

ryanofsky

Reviewed 3ae1fe4, but I think f08c9fb from #21726 added a real race that this workaround doesn't really address, and I made some suggestions below.

ryanofsky · 2022-09-29T18:27:52Z

src/test/coinstatsindex_tests.cpp

+    // Make sure coin_stats_index is not used by the scheduler thread after it
+    // has been destroyed.


In commit "test: fix race in shutdown code in coinstatsindex_initial_sync" (3ae1fe4)

I think it would be better to move the SyncWithValidationInterfaceQueue() call up one line before the Stop() call, and not an introduce an artificial pause between stopping the index and destroying the index. There should be no need for an extra pause between stopping and destroying. Adding the pause in the wrong place in the test could potentially mask real bugs, so better to add it in the right place one line up.

I also think the code comment "Make sure coin_stats_index is not used by the scheduler thread" and PR description "fix race in shutdown code" are a little misleading, because this is not really fixing a race condition. This is working around TSAN false positive errors that happen because TSAN doesn't understand our BaseIndex::BlockUntilSyncedToCurrentChain and BaseIndex::BlockConnected functions.

TSAN alerts about a race condition because it doesn't see the test thread which the destroys the object explicitly waiting for the scheduler thread which calls BlockConnected. But this is because, as an optimization, our BlockUntilSyncedToCurrentChain function skips calling SyncWithValidationInterfaceQueue when m_best_block_index matches the chain tip. This is a valid optimization as long as the BaseIndex::BlockConnected implementation does not do any reads or writes to *this after it updates m_best_block_index. This was the case before f08c9fb from #21726 did add a real race bug by making AllowPrune() and GetName() calls after updating m_best_block_index. I think a fix for this could look like:

--- a/src/index/base.cpp +++ b/src/index/base.cpp @@ -414,10 +414,18 @@ IndexSummary BaseIndex::GetSummary() const void BaseIndex::SetBestBlockIndex(const CBlockIndex* block) { assert(!node::fPruneMode || AllowPrune()); - m_best_block_index = block; if (AllowPrune() && block) { node::PruneLockInfo prune_lock; prune_lock.height_first = block->nHeight; WITH_LOCK(::cs_main, m_chainstate->m_blockman.UpdatePruneLock(GetName(), prune_lock)); } + // For thread safety and compatibility with m_best_block_index optimization + // in the BlockUntilSyncedToCurrentChain(), update m_best_block_index as + // the last step in this function, and avoid updating state or accessing + // *this after m_best_block_index is set. If state is updated after + // m_best_block_index, external threads won't be able to rely on + // BlockUntilSyncedToCurrentChain() to be in sync with that state. And if + // *this is accessed, external threads won't be able to use + // BlockUntilSyncedToCurrentChain() to safely destroy the index object. + m_best_block_index = block; }

For this commit, I think a more accurate comment explaining the SyncWithValidationInterfaceQueue(); call would say something like "To avoid false positive TSAN errors, explicitly signal the test thread from the validationinterface thread so TSAN does not think there is a race between index shutdown code and BlockConnected notification code. The BlockUntilSyncedToCurrentChain() call above is sufficient to avoid these races in reality, but TSAN can't detect this due to an optimization in that function."

This is working around TSAN false positive errors

No, this is a real issue (use-after-free), see steps to reproduce without tsan (for example with valgrind): #25365 (comment)

This is working around TSAN false positive errors

No, this is a real issue (use-after-free), see steps to reproduce without tsan (for example with valgrind): #25365 (comment)

Yes I did see that comment, but it was not reproducing an existing race condition, it was adding a new race condition by calling this->GetName() after the m_best_block_index = block assignment. If the m_best_block_index = block; assignment was done last, there would be no race.

My comment above may be a little muddled because I noticed the real race condition caused by f08c9fb from #21726 in the middle of writing it. I think we should fix the real race condition in non-test code making a change to SetBestBlockIndex like the one I suggested above. After we do that, the the current change 3ae1fe4 in this PR will just be silencing TSAN errors, not avoiding a real race condition anymore.

it was not reproducing an existing race condition, it was adding a new race condition by calling this->GetName() after the m_best_block_index = block assignment

Ok, but see also #25365 (comment). I could reproduce use-after-free in Valgrind even without adding a call to GetName(), just a sleep suffices.

I got the impression that when BaseIndex::SetBestBlockIndex() executes this is already freed, but I do not remember if I checked to access a member before the m_best_block_index assignment.

Moving the assignment to the bottom may fix it. Anyway, this whole story looks overly complicated to me. It shouldn't require multiple people discussing whether this is a bug or not and how to fix it exactly! I opened #26210 to explore an alternative.

Ok, but see also #25365 (comment). I could reproduce use-after-free in Valgrind even without adding a call to GetName(), just a sleep suffices.

Right. Your reproduction puts the sleep right before the UpdatePruneLock(GetName(), ...) call, so it does not need to add a new call to GetName() and does reveal an existing race condition in the test. Marco's reproduction puts that sleep after that and adds another GetName() call, so it is really adding a new race condition.

I got the impression that when BaseIndex::SetBestBlockIndex() executes this is already freed, but I do not remember if I checked to access a member before the m_best_block_index assignment.

I don't think this is possible because BlockUntilSyncedToCurrentChain would not have returned before the m_best_block_index assignment was done.

Moving the assignment to the bottom may fix it. Anyway, this whole story looks overly complicated to me. It shouldn't require multiple people discussing whether this is a bug or not and how to fix it exactly! I opened #26210 to explore an alternative.

Yes I agree moving the assignment to the bottom is not the best long term solution for races generally. I just think moving the assignment is the most direct fix for the race condition introduced in #21726. Also I think it makes code more logical and seems like a reliability improvement. I opened a new PR #26215 which moves the assignment, and I think it would be the way to address the real race condition, and turn the TSAN true positives into false positives. After that, this PR would be a good way to fix remaining false positives (again with the suggestion that I think it would be better for test shutdown sequence to be 1-flush 2-stop 3-destroy not 1-stop 2-flush 3-destroy so test behavior is closer to bitcoind shutdown behavior).

I this is a false positive, I wonder what the steps are to reproduce. After all tsan should be deterministic, assuming manual sleeps are added at the right places?

I this is a false positive, I wonder what the steps are to reproduce. After all tsan should be deterministic, assuming manual sleeps are added at the right places?

If the goal is to reproduce TSAN false positive errors that remain after #26215 fixes the race condition introduced by #21726, maybe a good way would be to put a 2 second sleep at the end of SetBestBlockIndex after it assigns m_best_block_index and 1 second sleep in the test before it calls BlockUntilSyncedToCurrentChain. This way the TSAN should see the scheduler thread reading the vtable ptr, and the test thread writing it, and no apparent synchronization between the two threads because BlockUntilSyncedToCurrentChain will return early without ever calling SyncWithValidationInterfaceQueue

Since commit f08c9fb from PR bitcoin#21726, the `BlockUntilSyncedToCurrentChain` behavior has been less reliable and also introduced a race condition in the coinstatsindex_initial_sync unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: Vasil Dimov <vd@FreeBSD.org> Co-authored-by: MacroFake <falke.marco@gmail.com>

vasild · 2022-10-04T10:40:43Z

3ae1fe4498...d2395c83c4: rebase + move SyncWithValidationInterfaceQueue() before coin_stats_index.Stop() as suggested above.

I think this should be good to go, unrelated to other PRs, as a minimal change that fixes the problem, to be included in 24.x.

src/test/coinstatsindex_tests.cpp

vasild · 2022-10-04T11:25:37Z

d2395c83c4..c4566600be: use same shutdown order also in txindex_tests.

ryanofsky

Code review ACK c456660. But it would be better if #26215 were merged before this PR, because unlike this PR, #26215 is a direct fix for the race condition introduced in #21726. After #26215, this PR would be easier to understand as a followup that addresses TSAN false positive errors and makes test shutdown sequences consistent.

I don't think it would be good to merge this PR before #26215, because it would only be fixing the race condition indirectly instead of directly, and it would be adding misleading comments about the purpose of the SyncWithValidationInterfaceQueue call following the BlockUntilSyncedToCurrentChain call. The new SyncWithValidationInterfaceQueue call shouldn't be necessary for anything other than making TSAN happy.

ryanofsky · 2022-10-04T13:58:55Z

src/test/coinstatsindex_tests.cpp

+    // Make sure coin_stats_index is not used by the scheduler thread after it
+    // has been destroyed.


I this is a false positive, I wonder what the steps are to reproduce. After all tsan should be deterministic, assuming manual sleeps are added at the right places?

If the goal is to reproduce TSAN false positive errors that remain after #26215 fixes the race condition introduced by #21726, maybe a good way would be to put a 2 second sleep at the end of SetBestBlockIndex after it assigns m_best_block_index and 1 second sleep in the test before it calls BlockUntilSyncedToCurrentChain. This way the TSAN should see the scheduler thread reading the vtable ptr, and the test thread writing it, and no apparent synchronization between the two threads because BlockUntilSyncedToCurrentChain will return early without ever calling SyncWithValidationInterfaceQueue

src/test/coinstatsindex_tests.cpp

Since commit f08c9fb from PR bitcoin#21726, index `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has also been a race condition in the `coinstatsindex_initial_sync` unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: Vasil Dimov <vd@FreeBSD.org> Co-authored-by: MacroFake <falke.marco@gmail.com>

…edToCurrentChain reliability 8891949 index: Improve BaseIndex::BlockUntilSyncedToCurrentChain reliability (Ryan Ofsky) Pull request description: Since commit f08c9fb from PR bitcoin/bitcoin#21726, index `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has also been a race condition in the `coinstatsindex_initial_sync` unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin/bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin/bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin/bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: vasild Co-authored-by: MarcoFalke ACKs for top commit: mzumsande: re-ACK 8891949 Tree-SHA512: 52e29e3772a0c92873c54e5ffb31dd66a909b68a2031b7585713cd1d976811289c98bd9bb41679a8689062f03be4f97bb8368696e789caa4607c2fd8b1fe289b

fanquake · 2022-10-10T06:28:49Z

#26215 has been merged. @vasild can you rebase/follow up with @ryanofsky's recent comments here.

vasild · 2022-10-10T13:31:26Z

c4566600be...26bef2957f: rebase and adjust after #26215

Note - I cannot reproduce this with the steps from #26188 (comment).

maflcko · 2022-10-10T13:44:31Z

Update title/OP?

vasild · 2022-10-10T14:45:37Z

Update title/OP?

Done, sorry.

maflcko · 2022-10-10T15:45:48Z

I can't reproduce this either

…entChain reliability 8891949 index: Improve BaseIndex::BlockUntilSyncedToCurrentChain reliability (Ryan Ofsky) Pull request description: Since commit f08c9fb from PR bitcoin#21726, index `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has also been a race condition in the `coinstatsindex_initial_sync` unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: vasild Co-authored-by: MarcoFalke ACKs for top commit: mzumsande: re-ACK 8891949 Tree-SHA512: 52e29e3772a0c92873c54e5ffb31dd66a909b68a2031b7585713cd1d976811289c98bd9bb41679a8689062f03be4f97bb8368696e789caa4607c2fd8b1fe289b

ryanofsky

Code review ACK 26bef29. I do think this change improves the tests, even if we can't reproduce the original TSAN errors anymore. The PR fixes a misleading comment in the txindex_initial_sync test, and makes shutdown order in that test more closely match bitcoind shutdown order. It also makes the coinstatsindex_initial_sync test consistent with the txindex_initial_sync test.

If we are sure there are no TSAN false positive errors here, a potentially simpler alternative to this PR is to remove the SyncWithValidationInterfaceQueue calls and comments from both tests. It's possible TSAN is taking atomic reads and writes into account and can infer that the test thread is waiting for the notification thread because it is checking the atomic m_best_block_index variable. That could explain why sleeps suggested in #26188 (comment) don't cause errors, even if the sleeps do cause BlockUntilSyncedToCurrentChain to return early without calling SyncWithValidationInterfaceQueue.

src/test/coinstatsindex_tests.cpp

Since commit f08c9fb from PR bitcoin#21726, index `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has also been a race condition in the `coinstatsindex_initial_sync` unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: Vasil Dimov <vd@FreeBSD.org> Co-authored-by: MacroFake <falke.marco@gmail.com> Github-Pull: bitcoin#26215 Rebased-From: 8891949

Fixes bitcoin#25365

…x_tests So that the call order is the same as in coinstatsindex_tests.

vasild · 2022-10-11T07:52:33Z

26bef2957f...861cb3fadc: reword the comments

... remove the SyncWithValidationInterfaceQueue calls and comments from both tests ...

I am also ok with this, since it is a test-only and a false positive, to leave it "unfixed" until it happens again in CI, if ever. @MarcoFalke, what do you think?

maflcko · 2022-10-11T08:00:57Z

review ACK 861cb3f

I think anything is fine here. Either keep both or delete both.

ryanofsky

Code review ACK 861cb3f. Just comment change since last review.

Looks like this is ready to merge and issue #25365 is resolved.

Since commit f08c9fb from PR bitcoin#21726, index `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has also been a race condition in the `coinstatsindex_initial_sync` unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: Vasil Dimov <vd@FreeBSD.org> Co-authored-by: MacroFake <falke.marco@gmail.com>

…dex_initial_sync 861cb3f test: move SyncWithValidationInterfaceQueue() before Stop() in txindex_tests (Vasil Dimov) 6526dc3 test: silence TSAN false positive in coinstatsindex_initial_sync (Vasil Dimov) Pull request description: Silence false positives from TSAN about unsynchronized calls to `BaseIndex::~BaseIndex()` and `BaseIndex::SetBestBlockIndex()`. They are synchronized, but beyond the comprehension of TSAN - by `SyncWithValidationInterfaceQueue()`, called from `BaseIndex::BlockUntilSyncedToCurrentChain()`. Fixes bitcoin#25365 ACKs for top commit: MarcoFalke: review ACK 861cb3f ryanofsky: Code review ACK 861cb3f. Just comment change since last review. Tree-SHA512: 8c30fdf2fd11d54e9adfa68a67185ab820bd7bd9f7f3ad6456e7e6d219fa9cf6d34b41e98e723eae86cb0c1baef7f3fc57b1b011a13dc3fe3d78334b9b5596de

Since commit f08c9fb0c6a799e3cb75ca5f763a746471625beb from PR bitcoin/bitcoin#21726, index `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has also been a race condition in the `coinstatsindex_initial_sync` unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb0c6a799e3cb75ca5f763a746471625beb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin/bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin/bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin/bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: Vasil Dimov <vd@FreeBSD.org> Co-authored-by: MacroFake <falke.marco@gmail.com>

* doc: add info about status code 404 for some rest endpoints * refactor: remove duplicate code from BlockAssembler * build: Fix `capnp` package build for Android * build: Specify native binaries explicitly when building `capnp` package From `configure --help`: --with-external-capnp use the system capnp binary (or the one specified with $CAPNP) instead of compiling a new one (useful for cross-compiling) * net: remove useless call to IsReachable() from CConnman::Bind() `CConnman::Bind()` is called without `BF_EXPLICIT` only when passed either `0.0.0.0` or `::`. For those addresses `IsReachable()` is always true (regardless of the `-onlynet=` setting!), meaning that the `if` condition never evaluates to true. `IsReachable()` is always true for the "any" IPv4 and IPv6 addresses because `CNetAddr::GetNetwork()` returns `NET_UNROUTABLE` instead of `NET_IPV4` or `NET_IPV6` and the network `NET_UNROUTABLE` is always considered reachable. It follows that `BF_EXPLICIT` is unnecessary, remove it too. * sync: simplify MaybeCheckNotHeld() definitions by using a template Reduce 4 of the `MaybeCheckNotHeld()` definitions to 2 by using a template. This also makes the function usable for other [BasicLockable](https://en.cppreference.com/w/cpp/named_req/BasicLockable) types. * sync: remove unused template parameter from ::UniqueLock The template parameter `typename Base = typename Mutex::UniqueLock` is not used, so remove it. Use internally defined type `Base` to avoid repetitions of `Mutex::UniqueLock`. * scripted-diff: Rename time symbols -BEGIN VERIFY SCRIPT- ren() { sed -i "s:\<$1\>:$2:g" $(git grep -l "\<$1\>" ':(exclude)src/versionbits.cpp') ; } ren nStart time_start ren nTimeStart time_start ren nTimeReadFromDiskTotal time_read_from_disk_total ren nTimeConnectTotal time_connect_total ren nTimeFlush time_flush ren nTimeChainState time_chainstate ren nTimePostConnect time_post_connect ren nTimeCheck time_check ren nTimeForks time_forks ren nTimeConnect time_connect ren nTimeVerify time_verify ren nTimeUndo time_undo ren nTimeIndex time_index ren nTimeTotal time_total ren nTime1 time_1 ren nTime2 time_2 ren nTime3 time_3 ren nTime4 time_4 ren nTime5 time_5 ren nTime6 time_6 ren nBlocksTotal num_blocks_total # Newline after semicolon perl -0777 -pi -e 's/; time_connect_total/;\n time_connect_total/g' src/validation.cpp perl -0777 -pi -e 's/; time_/;\n time_/g' src/validation.cpp -END VERIFY SCRIPT- * Use steady clock for bench logging * doc: add historical 0.20.2 release notes * doc: add historical 0.21.2 release notes * build: split ARM crc & crypto extension checks We currently perform the same check twice, to put the same set of flags in two different variables. Split the checks so we test for crc and crypto extensions independently. If we don't want to split, we should just delete the second AX_CHECK_COMPILE_FLAG check, and set ARM_CRC_CXXFLAGS & ARM_CRC_CXXFLAGS at the same time. * refactor: Do not discard `try_lock()` return value Microsoft's C++ Standard Library uses the `[[nodiscard]]` attribute for `try_lock()`. See: https://github.com/microsoft/STL/blob/main/stl/inc/mutex This change allows to drop the current suppression for the warning C4838 and helps to prevent the upcoming warning C4858. See: microsoft/STL@539c26c * signet/miner: reduce default interblock interval limit to 30min Also allow the operator to change it, if desired, without having to edit the code. * Squashed 'src/leveldb/' changes from 22f1e4a02f..e2f10b4e47 e2f10b4e47 Merge bitcoin-core/leveldb-subtree#34: win32: fix -Wmissing-field-initializers warnings 12c52b392d win32: fix -Wmissing-field-initializers warnings git-subtree-dir: src/leveldb git-subtree-split: e2f10b4e47bc950a81bc96d1c6db3a8048216642 * ci: Bump vcpkg to the latest version `2022.09.27` Dependency changes (2022.06.16.1 - 2022.09.27): - boost 1.79.0#0 -> 1.80.0#0 - sqlite3 3.37.2#1 -> 3.39.2#0 - zeromq 4.3.4#5 -> 4.3.4#6 * contrib: Fix capture_output in getcoins.py Our required Python version 3.6.12 does not support `capture_output` as a subprocess.run argument; this was added in python 3.7. We can emulate it by setting stdout and stderr to subprocess.PIPE * fuzz: Limit outpoints.size in txorphan target to avoid OOM * refactor: move Boost datetime usage to wallet This means we don't need datetime in a --disable-wallet build, and it isn't included in the kernel. * wallet: Use correct effective value when checking target * test: Check external coin effective value is used in CoinSelection * test: Use proper Boost macros instead of assertions * ci: Run `bench_bitcoin.exe --sanity-check` in "Win64 native" task Also a better name used for the script as it follows GNU's `make check`. * doc: bump bips.md up-to-date version to v24.0 This is a trivial follow-up to bitcoin#26124. * refactor: Drop `owns_lock()` call Co-authored-by: Vasil Dimov <vd@FreeBSD.org> * refactor: move DEFAULT_TXINDEX from validation to txindex * refactor: move DEFAULT_COINSTATSINDEX from validation to coinstatsindex * refactor: move DEFAULT_BLOCKFILTERINDEX from val to blockfilterindex * kernel: remove util/bytevectorhash.cpp * ci: Move `git config` commands into script where they are used * ci: Use same `merge_script` implementation for Windows as for all * ci: Remove unused package Address feedback from https://github.com/bitcoin/bitcoin/pull/24561/files#r985719812 * ci: Allow PIP_PACKAGES on centos This was added in 7fc5e86 but I can't see a reason why this should be forbidden. * test: Remove unused fCheckpointsEnabled from miner_tests The earliest checkpoint is at height 11111, so this can't possibly have any impact on this test. * build, msvc: Enable C4834 warning https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/c4834 * kernel: move RunCommandParseJSON to its own file Because libbitcoinkernel does not include this new object, this has the side-effect of eliminating the unnecessary boost::process dependency. * ci: Workaround Windows filesystem executable bit loss * fuzz: add util/mempool/h.cpp Moving the mempool code (Boost) out of util.h, results in a ~10% speedup (for me) when compiling the fuzz tests. * fuzz: pass max fee into ConsumeTxMemPoolEntry * refactor: Make 64-bit shift explicit Also this change enables MSVC warning C4334 for all codebase. https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-3-c4334 * refactor: move run_command from util to common Quoting ryanofsky: "util can be the library for things included in the kernel which the kernel can depend on, and common can be the library for other code that needs to be shared internally, but should not be part of the kernel or shared externally." * Remove clang-format from lint task clang-format could be used in scripted diffs, but remained largely unused. * test: Pass mempool reference to AssemblerForTest * test: Use dedicated mempool in TestPrioritisedMining No need for a shared mempool. Also remove unused chainparams parameter. * test: Use dedicated mempool in TestPackageSelection No need for a shared mempool. Also remove unused chainparams parameter. * test: Use dedicated mempool in TestBasicMining No need for a shared mempool. Also remove unused chainparams parameter. Can be reviewed with --ignore-all-space * refactor: mempool: add MemPoolLimits::NoLimits() There are quite a few places in the codebase that require us to construct a CTxMemPool without limits on ancestors and descendants. This helper function allows us to get rid of all that duplication. * refactor: mempool: use CTxMempool::Limits Simplifies function signatures by removing repetition of all the ancestor/descendant limits, and increases readability by being more verbose by naming the limits, while still reducing the LoC. * test: use NoLimits() in MempoolIndexingTest The (100, 1000000, 1000, 1000000) limits are arbitrarily high and don't restrict anything, they are just meant to calculate ancestors properly. Using NoLimits() makes this intent more clear and simplifies the code. * docs: improve docs where MemPoolLimits is used * Remove unused CDataStream::rdbuf method It is unused and seems unlikely to be ever used. * index: Improve BaseIndex::BlockUntilSyncedToCurrentChain reliability Since commit f08c9fb from PR bitcoin#21726, index `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has also been a race condition in the `coinstatsindex_initial_sync` unit test. It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of `m_best_block_index = block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more reliable. Also since commit f08c9fb, there has been a race condition in the `coinstatsindex_initial_sync` test. Before that commit, the atomic index best block pointer `m_best_block_index` was updated as the last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain` could safely be used in tests to wait for the last `BlockConnected` notification to be finished before stopping and destroying the index. But after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling `AllowPrune()` and `GetName()` on the index object. Reproducibility instructions for this are in bitcoin#25365 (comment) This commit fixes the `coinstatsindex_initial_sync` race condition, even though it will require an additional change to silence TSAN false positives, bitcoin#26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors bitcoin#25365. There is no known race condition outside of test code currently, because the bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not `BlockUntilSyncedToCurrentChain` to safely shut down. Co-authored-by: Vasil Dimov <vd@FreeBSD.org> Co-authored-by: MacroFake <falke.marco@gmail.com> * test: Prevent UB in `minisketch_tests.cpp` * test: Remove confusing DUMMY_P2WPKH_SCRIPT * docs: fix m_children to be a member of CTxMemPoolEntry * wallet: have prune error take precedence over assumedvalid From Russ Yanofsky: "Agree with all of Marco's points here and think this should be updated If havePrune and hasAssumedValidChain are both true, better to show havePrune error message. Assumed-valid error message is vague and not very actionable. Would suggest "Error loading wallet. Wallet requires blocks to be downloaded, and software does not currently support loading wallets while blocks are being downloaded out of order though assumeutxo snapshots. Wallet should be able to load successfully after node sync reaches height {block_height}" Co-authored-by: MacroFake <MarcoFalke@gmail.com> Co-authored-by: Russ Yanofsky <russ@yanofsky.org> * Fix comment typos in scriptpubkeyman.cpp, wallet.cpp, wallet.h Fix comment typos: sigature -> signature ponter -> pointer it's key -> its key * sync: avoid confusing name overlap (Mutex) Use `MutexType` instead of `Mutex` for the template parameter of `UniqueLock` because there is already a class named `Mutex` and the naming overlap is confusing. `MutexType` is used elsewhere in `sync.h`. * sync: remove DebugLock alias template Use `UniqueLock` directly. Type deduction works just fine from the first argument to the constructor of `UniqueLock`, so there is no need to repeat ```cpp UniqueLock<typename std::remove_reference<typename std::remove_pointer<decltype(cs)>::type>::type> ``` five times in the `LOCK` macros. Just `UniqueLock` suffices. * sync: rename AnnotatedMixin::UniqueLock to AnnotatedMixin::unique_lock This avoids confusion with the global `UniqueLock` and the snake case is consistent with `UniqueLock::reverse_lock. * iwyu: Add zmq source files * Update .cirrus.yml Co-authored-by: brunoerg <brunoely.gc@gmail.com> Co-authored-by: James O'Beirne <james.obeirne@pm.me> Co-authored-by: Hennadii Stepanov <32963518+hebasto@users.noreply.github.com> Co-authored-by: Vasil Dimov <vd@FreeBSD.org> Co-authored-by: MacroFake <falke.marco@gmail.com> Co-authored-by: fanquake <fanquake@gmail.com> Co-authored-by: Anthony Towns <aj@erisian.com.au> Co-authored-by: willcl-ark <will8clark@gmail.com> Co-authored-by: Aurèle Oulès <aurele@oules.com> Co-authored-by: Sebastian Falbesoner <sebastian.falbesoner@gmail.com> Co-authored-by: glozow <gloriajzhao@gmail.com> Co-authored-by: Cory Fields <cory-nospam-@coryfields.com> Co-authored-by: stickies-v <stickies-v@protonmail.com> Co-authored-by: Ryan Ofsky <ryan@ofsky.org> Co-authored-by: MacroFake <MarcoFalke@gmail.com> Co-authored-by: Russ Yanofsky <russ@yanofsky.org> Co-authored-by: Dimitris Tsapakidis <dimitris@tsapakidis.com> Co-authored-by: alex v <alex@encrypt-s.com>

DrahtBot added the Tests label Sep 27, 2022

vasild mentioned this pull request Sep 27, 2022

ThreadSanitizer: data race on vptr (ctor/dtor vs virtual call) in BaseIndex #25365

Closed

fanquake added this to the 24.0 milestone Sep 27, 2022

glozow requested a review from maflcko September 28, 2022 15:27

maflcko reviewed Sep 28, 2022

View reviewed changes

src/test/coinstatsindex_tests.cpp Outdated Show resolved Hide resolved

vasild force-pushed the fix_coinstatsindex_initial_sync branch from eaec74f to 40ef8e0 Compare September 29, 2022 08:19

maflcko approved these changes Sep 29, 2022

View reviewed changes

fanquake added the Needs backport (24.x) label Sep 29, 2022

vasild force-pushed the fix_coinstatsindex_initial_sync branch from 40ef8e0 to 3ae1fe4 Compare September 29, 2022 09:52

ryanofsky reviewed Sep 29, 2022

View reviewed changes

vasild mentioned this pull request Sep 30, 2022

Remove RegisterValidationInterface() #26210

Closed

ryanofsky mentioned this pull request Sep 30, 2022

index: Improve BaseIndex::BlockUntilSyncedToCurrentChain reliability #26215

Merged

vasild force-pushed the fix_coinstatsindex_initial_sync branch from 3ae1fe4 to d2395c8 Compare October 4, 2022 10:40

maflcko reviewed Oct 4, 2022

View reviewed changes

src/test/coinstatsindex_tests.cpp Show resolved Hide resolved

ryanofsky approved these changes Oct 4, 2022

View reviewed changes

vasild force-pushed the fix_coinstatsindex_initial_sync branch from c456660 to 26bef29 Compare October 10, 2022 13:29

vasild changed the title ~~test: fix race in shutdown code in coinstatsindex_initial_sync~~ test: silence TSAN false positive in coinstatsindex_initial_sync Oct 10, 2022

maflcko removed the Needs backport (24.x) label Oct 10, 2022

maflcko removed this from the 24.0 milestone Oct 10, 2022

ryanofsky approved these changes Oct 10, 2022

View reviewed changes

src/test/coinstatsindex_tests.cpp Outdated Show resolved Hide resolved

vasild added 2 commits October 11, 2022 09:45

test: silence TSAN false positive in coinstatsindex_initial_sync

6526dc3

Fixes bitcoin#25365

test: move SyncWithValidationInterfaceQueue() before Stop() in txinde…

861cb3f

…x_tests So that the call order is the same as in coinstatsindex_tests.

vasild force-pushed the fix_coinstatsindex_initial_sync branch from 26bef29 to 861cb3f Compare October 11, 2022 07:48

ryanofsky approved these changes Oct 12, 2022

View reviewed changes

fanquake merged commit 422efca into bitcoin:master Oct 13, 2022

vasild deleted the fix_coinstatsindex_initial_sync branch October 13, 2022 08:53

maflcko mentioned this pull request Mar 24, 2023

ci: Use TSan new runtime (llvm-16, take 3) #27298

Merged

fanquake mentioned this pull request Mar 28, 2023

index: ThreadSanitizer: data race on vptr #27355

Closed

bitcoin locked and limited conversation to collaborators Oct 13, 2023

		@@ -79,6 +79,10 @@ BOOST_FIXTURE_TEST_CASE(coinstatsindex_initial_sync, TestChain100Setup)
		// Shutdown sequence (c.f. Shutdown() in init.cpp)
		coin_stats_index.Stop();

		// Make sure coin_stats_index is not used by the scheduler thread after it
		// has been destroyed.

test: silence TSAN false positive in coinstatsindex_initial_sync #26188

test: silence TSAN false positive in coinstatsindex_initial_sync #26188

Uh oh!

Conversation

vasild commented Sep 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasild commented Sep 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maflcko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ryanofsky commented Sep 28, 2022

Uh oh!

vasild commented Sep 29, 2022

Uh oh!

maflcko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanofsky Sep 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasild commented Sep 29, 2022

Uh oh!

ryanofsky left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanofsky Sep 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanofsky Sep 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasild commented Oct 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vasild commented Oct 4, 2022

Uh oh!

ryanofsky left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fanquake commented Oct 10, 2022

Uh oh!

vasild commented Oct 10, 2022

Uh oh!

maflcko commented Oct 10, 2022

Uh oh!

vasild commented Oct 10, 2022

Uh oh!

maflcko commented Oct 10, 2022

Uh oh!

ryanofsky left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vasild commented Oct 11, 2022

vasild commented Sep 27, 2022 •

edited

Loading

vasild commented Sep 27, 2022 •

edited

Loading

ryanofsky Sep 29, 2022 •

edited

Loading

ryanofsky Sep 29, 2022 •

edited

Loading

ryanofsky Sep 30, 2022 •

edited

Loading

vasild commented Oct 4, 2022 •

edited

Loading