Skip to content

Conversation

ryanofsky
Copy link
Contributor

@ryanofsky ryanofsky commented Mar 14, 2024

This PR updates the CWallet::chainStateFlushed method to write the m_last_block_processed locator to the database as the wallet's best block instead of the chain tip locator.

Saving the last block processed locator as the best block is less fragile than saving chainStateFlushed locator as best block.

Right now wallet code relies on blockConnected notifications being sent before chainStateFlushed notifications, and on the chainStateFlushed locator pointing to the last connected block. But this is a fragile assumption that can be easily broken by ActivateBestChain/ConnectTrace changes, and in fact is currently broken (since d96c59c from #25740) when the assumeutxo snapshot block is validated. In that case the background validation chainStateFlushed notification is sent before the blockConnected notification so writing the locator as the best block would record a best block that was never processed. This specific change does not cause a problem for the wallet, because the wallet ignores events from the background validation chainstate. But nothing prevents similar out-of-order chainStateFlushed notifications from being sent in the future, so it is safer for the wallet to not rely on the chain tip sent in chainStateFlushed notifications.

A good followup to this change would probably be to drop the ChainStateFlushed locator argument entirely so it is not misused in the future. Indexing code already stopped using this locator argument a long time ago for identifying the best block (in 4368384 from #14121), though it is currently using the locator argument to log diagnostic warnings.

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 14, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29652.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK achow101

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30221 (wallet: Ensure best block matches wallet scan state by achow101)
  • #28710 (Remove the legacy wallet and BDB dependency by achow101)
  • #19463 (Prune locks by luke-jr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@ryanofsky
Copy link
Contributor Author

I couldn't really come up with a test for this bug directly. I did find if I added an assert in chainStateFlushed that the locator block height matches the best block height, I could trigger it with test code below:

--- a/src/wallet/wallet.cpp
+++ b/src/wallet/wallet.cpp
@@ -634,6 +634,14 @@ void CWallet::chainStateFlushed(ChainstateRole role, const CBlockLocator& loc)
     if (m_attaching_chain || role == ChainstateRole::BACKGROUND) {
         return;
     }
+    if (!loc.IsNull()) {
+        LOCK(cs_wallet);
+        int loc_height{-1};
+        if (m_last_block_processed_height >= 0 && chain().findBlock(loc.vHave.front(), FoundBlock().height(loc_height))) {
+            WalletLogPrintf("chainStateFlushed processed height %i locator height %i\n", m_last_block_processed_height, loc_height);
+            assert(m_last_block_processed_height >= loc_height);
+        }
+    }
     WalletBatch batch(GetDatabase());
     batch.WriteBestBlock(loc);
 }
--- a/test/functional/feature_assumeutxo.py
+++ b/test/functional/feature_assumeutxo.py
@@ -52,6 +52,8 @@ COMPLETE_IDX = {'synced': True, 'best_block_height': FINAL_HEIGHT}
 
 
 class AssumeutxoTest(BitcoinTestFramework):
+    def add_options(self, parser):
+        self.add_wallet_options(parser)
 
     def set_test_params(self):
         """Use the pregenerated, deterministic chain up to height 199."""
@@ -215,6 +217,7 @@ class AssumeutxoTest(BitcoinTestFramework):
         self.test_invalid_chainstate_scenarios()
 
         self.log.info(f"Loading snapshot into second node from {dump_output['path']}")
+        n1.createwallet('w', load_on_startup=True)
         loaded = n1.loadtxoutset(dump_output['path'])
         assert_equal(loaded['coins_loaded'], SNAPSHOT_BASE_HEIGHT)
         assert_equal(loaded['base_height'], SNAPSHOT_BASE_HEIGHT)

However, this doesn't trigger in the assumeutxo case where the snapshot block ChainStateFlushed notification is reliably sent before the BlockConnected notification, because wallet just ignores both notifications since they are associated with the background chain. The test above does trigger the assert with error chainStateFlushed processed height 199 locator height 299 but I think that is happening because one of walletInstance->chainStateFlushed(ChainstateRole::NORMAL, chain->getTipLocator()); calls on wallet startup.

This is all to say i couldn't really find a practical bug that could be triggered here or a good test to add to this PR.

@ryanofsky
Copy link
Contributor Author

ryanofsky commented Mar 14, 2024

Updated e295db8 -> c5583a3 (pr/noloc.1 -> pr/noloc.2, compare) dropping unused chainStateFlushed argument
Updated c5583a3 -> 5600631 (pr/noloc.2 -> pr/noloc.3, compare) fixing missed chainStateFlushed call in previous push.

virtual CBlockLocator getTipLocator() = 0;

//! Return a locator that refers to a block in the active chain.
//! If specified block is not in the active chain, return locator for the latest ancestor that is in the chain.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: This comment about getActiveChainLocator returning an ancestor locator if block was not on the active chain was true when the method was first added, but stopped being true in commit ed47094 from #25717. So the description and name of this method where somewhat misleading after that point.

I noticed while debugging bitcoin#24230 that the ChainStateFlushed notification can be
sent with a locator pointing to a different block than the block in the last
BlockConnected notification. This is bad because it could cause the wallet to
record that it is synced to a later block than it ever processed, and make it
potentially fail to scan blocks for relevant transactions if it is unloaded and
reloaded.

This problem should probably not happen in practice, because normally the
locator in the ChainStateFlushed notification does point to the block sent in
the last BlockConnected notification. But because of the way ActivateBestChain
accumulates block connected notifications, and the fact that FlushStateToDisk
is called from many different code paths, this isn't guaranteed. (The new
assumeutxo logic also introduces a situation where this is never the case: when
the background chain reaches the snapshot height and validates the snapshot,
the background chain is flushed before block connected notifications are sent.)

In any case, it is better if the wallet writes a locator actually pointing to
the last block that it processed instead of writing the locator validation code
passes to ChainStateFlushed, so this commit switches to the right locator.

A good followup to this change would probably be to drop the ChainStateFlushed
locator argument entirely so it is not misused in the future. Indexing code
already stopped using this locator argument a long time ago for identifying the
best block (in 4368384 from bitcoin#14121), but it is
still using the locator argument to log diagnostic warnings.
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/22673169748

WalletBatch batch(GetDatabase());
batch.WriteBestBlock(loc);
CBlockLocator loc;
WITH_LOCK(cs_wallet, chain().findBlock(m_last_block_processed, FoundBlock().locator(loc)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this shouldn't happen but.. what about logging an error if findBlock() returns false?

Also, could lock cs_wallet only to copy m_last_block_processed.

@@ -3290,7 +3294,7 @@ bool CWallet::AttachChain(const std::shared_ptr<CWallet>& walletInstance, interf
}
}
walletInstance->m_attaching_chain = false;
walletInstance->chainStateFlushed(ChainstateRole::NORMAL, chain.getTipLocator());
walletInstance->chainStateFlushed(ChainstateRole::NORMAL);
Copy link
Member

@furszy furszy Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing note: you could probably trigger the behavior by adding a sleep call here. When a reorg is processed in-between the ScanForWalletTransactions call and this chainStateFlushed call.

@achow101
Copy link
Member

Concept ACK

I'm now wondering whether the wallet should even be doing anything on ChainStateFlushed since that doesn't seem like it should have any bearing on what the wallet knows about. It seems like the better thing to do would be to update the best block record every time we finish processing a block in blockConnected, and just ignore chainStateFlushed. The best block is really only used during loading anyways to figure out how much we need to rescan. One thing that could happen if we did this is that on an unclean shutdown, the wallet could store a record that is ahead of the chainstate. On the next load, we would end up resetting to maybe a few blocks behind where the chainstate is at and therefore rescan a bit more than necessary, but that doesn't seem like it would really be a problem.

{
// Don't update the best block until the chain is attached so that in case of a shutdown,
// the rescan will be restarted at next startup.
if (m_attaching_chain || role == ChainstateRole::BACKGROUND) {
return;
}
WalletBatch batch(GetDatabase());
batch.WriteBestBlock(loc);
CBlockLocator loc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be useful to check that chainStateFlushed was called after blockConnected like was done in

bitcoin/src/index/base.cpp

Lines 341 to 353 in 5560741

// This checks that ChainStateFlushed callbacks are received after BlockConnected. The check may fail
// immediately after the sync thread catches up and sets m_synced. Consider the case where
// there is a reorg and the blocks on the stale branch are in the ValidationInterface queue
// backlog even after the sync thread has caught up to the new chain tip. In this unlikely
// event, log a warning and let the queue clear.
const CBlockIndex* best_block_index = m_best_block_index.load();
if (best_block_index->GetAncestor(locator_tip_index->nHeight) != locator_tip_index) {
LogPrintf("%s: WARNING: Locator contains block (hash=%s) not on known best "
"chain (tip=%s); not writing index locator\n",
__func__, locator_tip_hash.ToString(),
best_block_index->GetBlockHash().ToString());
return;
}
. I think checking that the CBlockLocator argument points to the same block as m_last_block_processed will suffice.

@achow101
Copy link
Member

achow101 commented Jun 3, 2024

I'm now wondering whether the wallet should even be doing anything on ChainStateFlushed since that doesn't seem like it should have any bearing on what the wallet knows about. [...]

I've implemented this idea in #30221

@ryanofsky
Copy link
Contributor Author

I'm now wondering whether the wallet should even be doing anything on ChainStateFlushed since that doesn't seem like it should have any bearing on what the wallet knows about.

Notifications are sent in order, so it does have some meaning. It means wallet is in-between blockConnected events and synchronized up to some block, and that the node has written some data to disk recently, so the wallet may want write its data to disk, too.

I think the only problem with the ChainStateFlushed notification is that because of the fact that FlushStateToDisk is called all over validation code, and notifications are asynchronous, the ChainStateFlushed locator argument is unreliable and buggy. The argument is also unnecessary for wallets and indexes because they already know the last block that they are synced to and don't need a redundant pointer to a potentially incorrect block (due to a bug on assumeutxo code or other other fragile parts of validation code).

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/22674943172

Hints

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@ryanofsky ryanofsky marked this pull request as draft October 15, 2024 14:59
@ryanofsky
Copy link
Contributor Author

Marking as draft, because this has a silent merge conflict and I don't think I will be able to fix it right away. I still do think this PR is a good idea, but #30221 can be reviewed too if the alternate approach it implements is preferred (updating the wallet every block even it doesn't change).

@DrahtBot
Copy link
Contributor

DrahtBot commented May 7, 2025

🐙 This pull request conflicts with the target branch and needs rebase.

ryanofsky added a commit that referenced this pull request May 19, 2025
30a94b1 test, wallet: Remove concurrent writes test (Ava Chow)
b44b7c0 wallet: Write best block record on unload (Ava Chow)
876a258 wallet: Remove unnecessary database Close step on shutdown (Ava Chow)
98a1a52 wallet: Remove chainStateFlushed (Ava Chow)
7fd3e1c wallet, bench: Write a bestblock record in WalletMigration (Ava Chow)
6d3a8b1 wallet: Replace chainStateFlushed in loading with SetLastBlockProcessed (Ava Chow)
7bacabb wallet: Update best block record after block dis/connect (Ava Chow)

Pull request description:

  Implements the idea discussed in #29652 (comment)

  Currently, `m_last_block_processed` and `m_last_block_processed_height` are not guaranteed to match the block locator stored in the wallet, nor do either of those fields actually represent the last block that the wallet is synced up to. This is confusing and unintuitive.

  This PR changes those last block fields to be updated whenever the wallet makes a change to the db for new transaction state found in new blocks. Whenever a block is received that contains a transaction relevant to the wallet, the last block locator will now be written to disk. Furthermore, every block disconnection will now write an updated locator.

  To ensure that the locator is relatively recent and loading rescans are fairly quick in the event of unplanned shutdown, it is also now written every 144 blocks (~1 day). Additionally it is now written when the wallet is unloaded so that it is accurate when the wallet is loaded again.

  Lastly, the `chainstateFlushed` notification in the wallet is changed to be a no-op. The best block locator record is no longer written when `chainstateFlushed` is received from the node since it should already be mostly up to date.

ACKs for top commit:
  rkrux:
    ACK 30a94b1
  mzumsande:
    Code Review ACK 30a94b1
  ryanofsky:
    Code review ACK 30a94b1. Only changes since last review are using WriteBestBlock method more places and updating comments.

Tree-SHA512: 46117541f8aaf13dde57430e813b4bbbd5e146e2632769675803c8e65a82f149a7cc6026489a127d32684b90124bd2b7c28216dbcfa6a47447300e8f3814e029
@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 4, 2025

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since #30221 got merged, I think this one can be closed

@ryanofsky
Copy link
Contributor Author

re: #29652 (review)

Since #30221 got merged, I think this one can be closed

Thanks! #30221 avoids the issue of wallets potentially writing the incorrect locators on flush notifications by just ignoring them and flushing in a less coordinated way. So the main concern of this PR is no longer an issue and it makes sense to close.

I still think the changes here dropping the Chain::getTipLocator and Chain::getActiveChainLocator methods would be good to apply. The first method is unused and since #25717, the second method has a misleading name and description is redundant with Chain::findBlock. So it would be nice to see those changes in another PR.

@pablomartin4btc
Copy link
Member

I still think the changes here dropping the Chain::getTipLocator and Chain::getActiveChainLocator methods would be good to apply. The first method is unused and since #25717, the second method has a misleading name and description is redundant with Chain::findBlock. So it would be nice to see those changes in another PR.

@ryanofsky, @furszy, done in #33169.

achow101 added a commit that referenced this pull request Aug 14, 2025
…r and incaccurate getActiveChainLocator

2b00030 interfaces, chain, refactor: Remove inaccurate getActiveChainLocator (pablomartin4btc)
110a0f4 interfaces, chain, refactor: Remove unused getTipLocator (pablomartin4btc)

Pull request description:

  Remove `Chain::getTipLocator`, `Chain::GetLocator()`, and `Chain::getActiveChainLocator`:
  - `Chain::getTipLocator` is no longer used.
  - `Chain::GetLocator`, replaced its call by `GetLocator()`, which uses `LocatorEntries`, avoiding direct access to the chain itself (change suggested by l0rinc while reviewing this PR to maintain consistency with the overall refactoring).
  - `Chain::getActiveChainLocator`, whose name was misleading, has functionality redundant with Chain::findBlock.
    - Additionally, the comment for getActiveChainLocator became inaccurate following changes in commit ed47094 (from PR #25717).

  This is a [follow-up](#29652 (comment)) to #29652.

ACKs for top commit:
  achow101:
    ACK 2b00030
  furszy:
    ACK 2b00030
  stickies-v:
    ACK 2b00030
  w0xlt:
    ACK 2b00030

Tree-SHA512: b12ba6a15feeaeec692d69204a6e155e3af43edfac25597dabf14cacca1e4a2152574816e58dc544f39043c5721f5e707acf544f4541d3b9c0f8c0c40069215e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants