-
Notifications
You must be signed in to change notification settings - Fork 37.8k
wallet: UnregisterValidationInterface before SyncWithValidationInterfaceQueue #18280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wallet: UnregisterValidationInterface before SyncWithValidationInterfaceQueue #18280
Conversation
8e8695d
to
12374e6
Compare
c1ee308
to
afe8d39
Compare
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
About to look into this but this PR description seems like it is going to take a lot of digging to understand. Is it possible to summarize in a paragraph what this change is doing, what problem it fixes, and how it compares to the alternative? Thank you! 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to help other reviewers, this seems to make three changes:
-
Instead of RegisterValidationInterface being non-blocking and beginning to send notifications right away, even if they preceded the RegisterValidationInterface call, it will now block waiting for any previously queued notifications to be drained, then attach the notification handler, then finally return.
-
Instead of UnregisterValidationInterface being non-blocking and turning off notifications right away, it waits for any notifications that were queued prior to the UnregisterValidationInterface to be handled and blocks. Then it disconnects and returns after they were handled.
-
Stops holding cs_main while sending the loadwallet notification.
Change 3 seems like a good change just because it reduces unnecessary locking, but it's not clear what motivated the change. Maybe it would fix the deadlock @fanquake alluded to in #16307 (comment), but it's not possible to to tell without a backtrace.
Change 1 at first glance seems like it more likely lead to missing notifications on startup, but maybe it's intended to prevent the wallet redundantly processing mempool transactions blocks that have just been rescanned? Unclear
Change 2 seems like it could fix missing notifications on shutdown, but I'm not sure where that's been reported. It also seems like it could slow down shutdown unnecessarily.
Note: I have a change which handles attaching notifications in a more deliberate nonracy way in #15719. Not sure if it overlaps with bugs that are being seen now, though. And it's nice to have more minimal bugfixes anyway.
src/wallet/wallet.cpp
Outdated
@@ -3975,14 +3985,6 @@ std::shared_ptr<CWallet> CWallet::CreateWalletFromFile(interfaces::Chain& chain, | |||
// Register with the validation interface. It's ok to do this after rescan since we're still holding locked_chain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is no longer true, and seems like it means there can now be missing notifications. I wonder if a different approach would be to send the loadwallet notification before locking the chain & scanning instead of after?
Sorry @ryanofsky, should have tagged wip. Will address your comments and address each required change once this is good. Thanks for reviewing and your comments - will also revisit #15719. |
afe8d39
to
477a4b7
Compare
477a4b7
to
bba30cb
Compare
Updated OP. Reduced the scope to just fix #16307. Will submit other PRs with relevant changes from previous commits. |
56db247
to
2a42fc1
Compare
Updated to include dc62f3c which ensures that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 2a42fc1. I left one comment below that would be nice to address. Also dc62f3c could use a better commit description. It's unclear when SyncWithValidationInterfaceQueue would hang now. Is there a shutdown race where wallets are outliving the scheduler? Or is this just a test workaround for cases where scheduler is not running?
src/scheduler.cpp
Outdated
void SingleThreadedSchedulerClient::MaybeScheduleProcessQueue() { | ||
if (!m_pscheduler->AreThreadsServicingQueue()) { | ||
EmptyQueue(); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Handle no threads serving in MaybeScheduleProcessQueue" (dc62f3c)
I think it would make more sense to add this logic at the bottom of AddToProcessQueue
(and avoid calling MaybeScheduleProcessQueue
in no thread case) than having it at the top of MaybeScheduleProcessQueue
and returning early.
Few reasons:
- Having this logic doesn't seem useful in ProcessQueue, the other place where MaybeScheduleProcessQueue is called
- Adding this code here creates recursion ProcessQueue -> MaybeScheduleProcessQueue -> EmptyQueue -> ProcessQueue
- If code is added here the name
MaybeScheduleProcessQueue
is now misleading because this is running callbacks immediately, not scheduling them for later
2a42fc1
to
c9ca6c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK c9ca6c4. Just suggested scheduler changes since last review
@@ -196,6 +196,9 @@ void SingleThreadedSchedulerClient::AddToProcessQueue(std::function<void ()> fun | |||
m_callbacks_pending.emplace_back(std::move(func)); | |||
} | |||
MaybeScheduleProcessQueue(); | |||
if (!m_pscheduler->AreThreadsServicingQueue()) { | |||
EmptyQueue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Handle no threads serving in MaybeScheduleProcessQueue" (201eebf)
Commit message subject is out of date since MaybeScheduleProcessQueue
is no longer changing.
Also I'm surprised by "SyncWithValidationInterfaceQueue is called after
scheduler thread exists, which happen at shutdown" since I would hope wallets are getting unloaded before the scheduler stops, but maybe this is not the case currently. Probably beyond the scope of this PR to address, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since I would hope wallets are getting unloaded before the scheduler stops
Not currently the case.
Probably beyond the scope of this PR to address, though
Agree, I've tried and it caused some other failures.
Commit message subject is out of date since MaybeScheduleProcessQueue is no longer changing.
Ops need to update.
src/scheduler.cpp
Outdated
@@ -196,6 +196,9 @@ void SingleThreadedSchedulerClient::AddToProcessQueue(std::function<void ()> fun | |||
m_callbacks_pending.emplace_back(std::move(func)); | |||
} | |||
MaybeScheduleProcessQueue(); | |||
if (!m_pscheduler->AreThreadsServicingQueue()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Handle no threads serving in MaybeScheduleProcessQueue" (201eebf)
If we can make this code:
if (m_pscheduler->AreThreadsServicingQueue()) [
MaybeScheduleProcessQueue();
} else {
EmptyQueue();
}
it would be more obvious what's going on here.
Otherwise if the unconditional MaybeScheduleProcessQueue
call is needed it would be good have a comment saying what it's needed for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be inside EmptyQueue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be inside
EmptyQueue
I wouldn't change emptyqueue because it's called other places which don't need this logic, and I can't see how the behavior would fit in there (how you could coherently describe it or change the emptyqueue name to reflect it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed. However if the scheduler thread is stopped after AreThreadsServicingQueue
then the function won't be called right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test wallet_multiwallet fails due to this code path, see my changes in EmptyQueue, you should schedule process queue when event queue is not empty, the logic is reversed there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes much more sense compare to current version
void SingleThreadedSchedulerClient::EmptyQueue() {
if (!m_pscheduler->AreThreadsServicingQueue())
return;
auto pendingCallbacks = [this]() -> bool {
LOCK(m_cs_callbacks_pending);
return !m_callbacks_pending.empty();
};
bool should_continue = pendingCallbacks();
while (should_continue) {
ProcessQueue();
should_continue = pendingCallbacks();
}
}
- You don't need to do stupid things, like here one,
if (!m_pscheduler->AreThreadsServicingQueue())
- You don't need to check against is there pending callbacks it's done there, so if you don't have pending ones you don't need to enter eventually endless waiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iv'e been trying to reproduce appveyor error wihtout any luck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- You don't need to do stupid things, like here one,
if (!m_pscheduler->AreThreadsServicingQueue())
- You don't need to check against is there pending callbacks it's done there, so if you don't have pending ones you don't need to enter eventually endless waiting.
@bvbfan, I'm confused by what you are saying here. You seem to be saying you don't need two things, but your code sample is doing both those things. Are you saying that for some reason it is better to do (1) in EmptyQueue
instead of AddToProcessQueue
? If so, what is the reason you think this? To me it seems better to do (1) in AddToProcessQueue
instead of EmptyQueue
, because AddToProcessQueue
has to be aware of thread state while EmptyQueue
does not.
I'm even more confused what you are saying about (2) because you're saying we don't need a check for pending callbacks but your code is actually adding an extra check (before the first loop iteration).
More importantly, I can't figure out what this suggestion has to do with wallet_multiwallet.py. If you can clarify, it would be much appreciated. I've also been trying to reproduce the failure there and look for possible causes and haven't been able to figure it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by what you are saying here. You seem to be saying you don't need two things, but your code sample is doing both those things.
I mean you don't need to do these things outside else, i was not clear. These 2 things should be done here in EmptyQueue
More importantly, I can't figure out what this suggestion has to do with wallet_multiwallet.py
if (m_pscheduler->AreThreadsServicingQueue()) [
MaybeScheduleProcessQueue();
} else {
EmptyQueue();
}
This code is wrong, you try to process / flush events but it does not do that. Let analyse current code
void SingleThreadedSchedulerClient::EmptyQueue() {
assert(!m_pscheduler->AreThreadsServicingQueue()); // <- assert (you can avoid that)
bool should_continue = true;
while (should_continue) {
ProcessQueue(); // <--- here is the bigger mistake, it tries to process events before it's checked that has pending callbacks, you can enter this and wait to infinity (since app is shutting down) it should be checked for pendings before process queue
LOCK(m_cs_callbacks_pending);
should_continue = !m_callbacks_pending.empty();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ProcessQueue(); // <--- here is the bigger mistake, it tries to process events before it's checked that has pending callbacks, you can enter this and wait to infinity (since app is shutting down) it should be checked for pendings before process queue
Explain this more? ProcessQueue
doesn't wait for callbacks if there aren't any, it just runs a callbacks if there is one.
Iv'e been trying to reproduce appveyor error wihtout any luck.
I finally reproduced the problem cutting down the test and running in a loop with --usecli. Looking at the stack trace I see unloadwallet
RPC thread waiting for scheduler thread MaybeCompactWalletDB
call, which is stuck in ReleaseWallet
calling SyncWithValidationInterfaceQueue
, which is stuck because SyncWithValidationInterfaceQueue
will always hang if called from the scheduler thread. This is basically the same deadlock that was in fanquake's backtrace from #16307 (comment), which suggests that 2a42fc1 and 2097e35 aren't going to work as fixes for this problem in their current form. Maybe more can be done to get them working but an alternate approach like 06d2b53 (branch, comment) might be easier at this point
This is necessary if SyncWithValidationInterfaceQueue is called after scheduler thread exists, which happen at shutdown.
c9ca6c4
to
2097e35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 2097e35. Just suggested change clarifying AddToProcessQueue / MaybeScheduleProcessQueue logic since last review
@@ -195,7 +195,11 @@ void SingleThreadedSchedulerClient::AddToProcessQueue(std::function<void ()> fun | |||
LOCK(m_cs_callbacks_pending); | |||
m_callbacks_pending.emplace_back(std::move(func)); | |||
} | |||
MaybeScheduleProcessQueue(); | |||
if (m_pscheduler->AreThreadsServicingQueue()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Handle no threads serving in MaybeScheduleProcessQueue" (03cd591)
Was experimenting and wrote a small test for this change:
// Ensure scheduled tasks run even when there are no scheduler threads
BOOST_AUTO_TEST_CASE(singlethreadedscheduler_nothreads)
{
CScheduler scheduler;
SingleThreadedSchedulerClient client(&scheduler);
bool task_ran = false;
client.AddToProcessQueue([&] { task_ran = true; });
BOOST_CHECK(task_ran);
}
Could add this to scheduler_tests.cpp. It fails without this commit and passes with it.
Replaced by #18338. |
From boost signal documentation at https://www.boost.org/doc/libs/1_72_0/doc/html/signals2/thread-safety.html:
This means that
UnregisterValidationInterface
doesn't prevent more calls to that interface.This PR fixes the assumption that no validation calls happen after
UnregisterValidationInterface
in the following codebitcoin/src/wallet/wallet.cpp
Lines 114 to 115 in 5d92ac2
The actual bug is that
delete wallet
races with the validation notifications.The fix consists in synchronizing with the validation queue (which happens in
BlockUntilSyncedToCurrentChain
) afterUnregisterValidationInterface
.Alternative to #18279. Fixes #16307.