-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Ensure wallet and chain tip are in sync #18279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/wallet/wallet.cpp
Outdated
bool tipIsSame = false; | ||
do { | ||
uint256 last_block_hash = WITH_LOCK(cs_wallet, return m_last_block_processed); | ||
tipIsSame = chain().waitForNotificationsIfTipIsNotSame(last_block_hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this whole scope happen under the cs_wallet lock?
Also, shouldn't m_chain_notifications_handler.reset()
happen inside here under the wallet lock?
Otherwise a new block could come in and put new notifications in the handler while we reset it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this whole scope happen under the cs_wallet lock?
No, we leave mutex for process thread to finish.
Also, shouldn't m_chain_notifications_handler.reset() happen inside here under the wallet lock?
No.
Otherwise a new block could come in and put new notifications in the handler while we reset it.
I'm aware of that, we can have a function that takes a mutex and reset notifications
I think this is the wrong approach. |
That's at least ensure sync between wallet and chain tips otherwise all calls to |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
@promag the problem is not the queued connection / disconnection, actually
signal is received before disconnect is called by erasing the caller. Fix unit test and rebase. |
Let's clarify a bit more.
Since
no crashes anymore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've hit into #16307 issue while doing refactoring on this part of the code. Current refactoring direction is to move towards the wallet and chain being independent tips, not more synchronized so another way to solve this would be to reset handler correctly while holding wallet lock and then test handler still exists in BlockConnected/BlockDisconnected/..., flushing correctly and committing right locator means worst-case we redo a rescan at wallet loading.
Doing it race-condition free likely need a handler lock though (or `NotificationsHandlerImpl locking wallet...)
i don't think that's right approach to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #18279 (comment)
That's at least ensure sync between wallet and chain tips otherwise all calls to
BlockUntilSyncedToCurrentChain
are just pointless, they don't do what is supposed to do. Your patch looks fine but not what we want when calls a function likeBlockUntilSyncedToCurrentChain
I think there is a misunderstanding here (maybe my own), but my understanding is that the point of BlockUntilSyncedToCurrentChain is just to ensure that if you make two RPC calls in a row the second RPC call is always aware of the result of first call. Ensuring that notification in the queue from the time of the call are processed is sufficient to do this, without trying to guarantee that the wallet tip and node tips are equal.
Checking tips for equality is just an optimization to avoid waiting for the queue, and guaranteeing that the tips are equal isn't actually possible (despite the attempt here) because BlockUntilSyncedToCurrentChain isn't called with any locks held.
src/interfaces/chain.cpp
Outdated
@@ -160,8 +160,8 @@ class NotificationsHandlerImpl : public Handler, CValidationInterface | |||
void disconnect() override | |||
{ | |||
if (m_notifications) { | |||
m_notifications = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change seems clearly correct. Good catch! Maybe m_notifications
should be an atomic pointer too since m_notifications
is get and set from different threads, though probably the synchronization done by boost signals & the scheduler prevents bugs from it not being atomic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe m_notifications should be an atomic pointer too since m_notifications is get and set from different threads, though probably the synchronization done by boost signals & the scheduler prevents bugs from it not being atomic
You're right
src/interfaces/chain.cpp
Outdated
@@ -346,13 +346,14 @@ class ChainImpl : public Chain | |||
{ | |||
return MakeUnique<NotificationsHandlerImpl>(*this, notifications); | |||
} | |||
void waitForNotificationsIfTipChanged(const uint256& old_tip) override | |||
bool waitForNotificationsIfTipIsNotSame(const uint256& tip) override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the name change, and I can see how the having the bool return value should make races and the deadlock reported #16307 (comment) less likely, though I suspect there is also probably a more direct and reliable fix we can find
src/wallet/wallet.cpp
Outdated
wallet->m_chain_notifications_handler.reset(); | ||
WITH_LOCK(wallet->cs_wallet, wallet->Flush()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acquiring the cs_wallet lock here is unexpected. If this is really neccessary, I think the lock should be acquired closer to where it's used inside the Flush implementation, probably with a comment if the purpose of the lock isn't more obvious there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a misunderstanding here (maybe my own), but my understanding is that the point of BlockUntilSyncedToCurrentChain is just to ensure that if you make two RPC calls in a row the second RPC call is always aware of the result of first call. Ensuring that notification in the queue from the time of the call are processed is sufficient to do this, without trying to guarantee that the wallet tip and node tips are equal.
Yes, but mostly after BlockUntilSyncedToCurrentChain lock is acquired and there is no guarantee that notification is faster than this held, i.e. you can leave BlockUntilSyncedToCurrentChain and don't do what you want, you gain mutex before notification thread and you're not sync anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here after BlockUntilSyncedToCurrentChain is ensuring it's on current tip, the lock tries to guarantee that notification thread is finished before we flush and release memory.
@promag, @ryanofsky i have another, looking better to me, solution ensuring there is no background tasks, but i've not push since it's not well tested
Flushing callback right after disconnecting signals should effectively ensure we don't have pending and it will not need atomic notification pointer or so. |
re: #18279 (comment)
This fix is very similar to c372219 from #18280, and probably we should review and merge #18280 when it's ready and close this PR. In my opinion though, waiting for notifications is a fragile fix that will slow down wallet unloading and node shutdown without good reason. A better fix would be to switch to shared pointers instead of raw pointers and fix the use-after-delete bug more directly: 06d2b53 (branch, comment) |
Also sync should be in |
Of course no need to close this PR if you want to make other improvements, but I hope you can review #18280 when it's ready as a fix for the original issue. Also would encourage closing this PR and opening a new one if the new version will differ substantially, so the review comment history will be intelligible. I would also caution that while BlockUntilSyncedToCurrentChain can definitely be cleaned up, its purpose is ensuring consistent results between RPC calls and it shouldn't be changed to block longer than actually necessary to fix real bugs.
This is only true when unregistering wallets, as far as I know. Not things like indexes. Also if we fix shared_ptr reference counting with 06d2b53 (branch, comment) it should not even be necessary for wallets. No reason conceptually wallets should have to wait for events they don't care about just to fix a pointer lifetime issue |
No, that's not correct to me,
doing in |
Agree with this. From #18280:
But just calling Obviously there are various possible fixes, like #18280 or 06d2b53. But this approach is clearly not good. |
Fixes #16307
Signed-off-by: Anthony Fieroni bvbfan@abv.bg