Skip to content

Conversation

achow101
Copy link
Member

Currently when we load a wallet, we just iterate through all of the records in the database and add them completely statelessly. However we have some records which do rely on other records being loaded before they are. To deal with this, we use CWalletScanState to hold things temporarily until all of the records have been read and then we load the stateful things.

However this can be slow, and with some future improvements, can cause some pretty drastic slowdowns to retain this pattern. So this PR changes the way we load records by choosing to load the records in a particular order. This lets us do things such as loading a descriptor record, then finding and loading that descriptor's cache and key records. In the future, this will also let us use IsMine when loading transactions as then IsMine will actually be working as we now always load keys and descriptors before transactions.

In order to get records of a specific type, this PR includes some refactors to how we do database cursors. Functionality is also added to retrieve a cursor that will give us records beginning with a specified prefix.

Lastly, one thing that iterating the entire database let us do was to find unknown records. However even if unknown records were found, we would not do anything with this information except output a number in a log line. With this PR, we would no longer be aware of any unknown records. This does not change functionality as we don't do anything with unknown records, and having unknown records is not an error. Now we would just not be aware that unknown records even exist.

@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 19, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK MarcoFalke, furszy, ryanofsky
Concept ACK theStack, pk-b2, laanwj
Approach ACK w0xlt

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #27920 (wallet: bugfix, always use apostrophe for spkm descriptor ID by furszy)
  • #27865 (wallet: Track no-longer-spendable TXOs separately by achow101)
  • #27286 (wallet: Keep track of the wallet's own transaction outputs in memory by achow101)
  • #26836 (wallet: simplify addressbook migration and encapsulate access by furszy)
  • #26728 (wallet: Have the wallet store the key for automatically generated descriptors by achow101)
  • #26596 (wallet: Migrate legacy wallets to descriptor wallets without requiring BDB by achow101)
  • #25907 (wallet: rpc to add automatically generated descriptors by achow101)
  • #22341 (rpc: add getxpub by Sjors)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@theStack
Copy link
Contributor

theStack commented May 2, 2022

Concept ACK

Copy link
Contributor

@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK.
Getting rid of CWalletScanState and ReadKeyValue and moving the read logic to WalletBatch::LoadWallet makes the code clearer and easier to understand.

@pk-b2
Copy link

pk-b2 commented May 6, 2022

Concept ACK

std::unique_ptr<SQLiteCursor> cursor = std::make_unique<SQLiteCursor>(start_range, end_range);
if (!cursor) return nullptr;

const char* stmt_text = "SELECT key, value FROM main WHERE key >= ? AND key < ?";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> explain query plan SELECT key, value FROM main WHERE key >= ? AND key < ?

Verified that this uses an index avoiding a table scan

SEARCH TABLE main USING INDEX sqlite_autoindex_main_1 (key>? AND key<?)`

@laanwj
Copy link
Member

laanwj commented May 31, 2022

Concept ACK on querying specifically what is needed from the database when loading. Maybe some day we don't even have to read all records into memory.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 973d910

Looks good, probably ready for merge with a third ACK.

@achow101 achow101 force-pushed the wallet-load-order branch from 973d910 to 7cb7740 Compare June 26, 2023 22:05
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 7cb7740. Just more suggested error handling tweaks since last review

@DrahtBot DrahtBot requested a review from furszy June 26, 2023 22:35
Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reACK 7cb7740

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm ACK 7cb7740 🍖

Show signature

Signature:

untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: lgtm ACK 7cb774048c5f04b3e20d95e794c5a350ea4eff97 🍖
wuc4npVuTb3Yb6tNza4MK7G2Z6qzHiajO9EsxCW7ib4DxRzWDxs552TGpGVPJxZ5qafuYSW65Z4/WLbHadprCA==

result = std::max(result, locked_utxo_res.m_result);

// Load orderposnext record
// Note: There should only be one ORDERPOSNEXT record with nothing trailng the type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dc1ef26: trailing (typo)

struct LoadResult
{
DBErrors m_result{DBErrors::LOAD_OK};
std::string m_error{};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit in 20d1fc5: I think this is still wrong? Currently this only collects the first error (in cases where assigning the error is guarded by an empty check). Also, it is unused by the caller. Suggested fix:

diff --git a/src/wallet/walletdb.cpp b/src/wallet/walletdb.cpp
index 1fab6a7f9c..e466c1c477 100644
--- a/src/wallet/walletdb.cpp
+++ b/src/wallet/walletdb.cpp
@@ -754,7 +754,6 @@ static DBErrors LoadWalletFlags(CWallet* pwallet, DatabaseBatch& batch) EXCLUSIV
 struct LoadResult
 {
     DBErrors m_result{DBErrors::LOAD_OK};
-    std::string m_error{};
     int m_records{0};
 };
 
@@ -786,9 +785,10 @@ static LoadResult LoadRecords(CWallet* pwallet, DatabaseBatch& batch, const std:
         std::string type;
         ssKey >> type;
         assert(type == key);
-        DBErrors record_res = load_func(pwallet, ssKey, ssValue, result.m_error);
+        std::string log_error{};
+        DBErrors record_res = load_func(pwallet, ssKey, ssValue, log_error);
         if (record_res != DBErrors::LOAD_OK) {
-            pwallet->WalletLogPrintf("%s\n", result.m_error);
+            pwallet->WalletLogPrintf("%s\n", log_error);
         }
         result.m_result = std::max(result.m_result, record_res);
         ++result.m_records;

Also, the commit description seems to imply behavior changed where it didn't, no?

Suggested range diff:

 8:  20d1fc5061 !  8:  0bf8188513 walletdb: Refactor legacy wallet record loading into its own function
    @@ Commit message
     
         Exceptions are handled on a per-record type basis, rather than globally.
         For private keys in a legacy wallet and the encryption keys,
    -    a deserialization error will result in an early return rather than
    +    a deserialization error will result in a
         contined loading of the remaining keys, with the resulting wallet
    -    failing to load. For other record types, errors that were previously
    +    failing to load, unchanged from previous behavior. For other record types, errors that were previously
         handled by the global exception handler will result in loading failure
    +    , also skipping the continued loading of remaining keys,
         instead of a non-critical error.

For "other record types", could also unify the section with the section in the next commit, which should be the same?

    Exception handling for these records changes to a per-record type basis,
    rather than globally. This results in some records now failing with a
    critical error rather than a non-critical one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, could mention that for other record types, previous early returns from LoadWallet like UNEXPECTED_LEGACY_ENTRY would now be scheduled for later, and it continues to load more prefix-keys?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented the suggested changes.

Also, could mention that for other record types, previous early returns from LoadWallet like UNEXPECTED_LEGACY_ENTRY would now be scheduled for later, and it continues to load more prefix-keys?

I don't think that's necessarily true? I'm planning on revisiting the error handling in a followup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see the early return in LoadWallet in master here:

return DBErrors::UNEXPECTED_LEGACY_ENTRY;

In this pull you can see that it calls std::max and then continues with LoadWallet:

result = std::max(LoadLegacyWalletRecords(pwallet, *m_batch, last_client), result);
// Load descriptors
result = std::max(LoadDescriptorWalletRecords(pwallet, *m_batch, last_client), result);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was that there are records that now early return but didn't previously, but I suppose that's the opposite of what you were talking about.

@maflcko
Copy link
Member

maflcko commented Jun 27, 2023

Also, it may be good to add tests for currently untested code: https://marcofalke.github.io/b-c-cov/total.coverage/src/wallet/wallet.cpp.gcov.html#2926

@ryanofsky
Copy link
Contributor

With 3 up-to-date acks, and all the review that's happened so far, I think this looks ready to merge now. But will wait to see if @achow101 wants to respond to newest comments.

I like Marco's error handling suggestions since they simplify code and just print all the errors. If it turns out there are this cases where this repeats too many errors, it seems like it would be easy to add a bit of code to LoadRecords that just compares new error messages to previous and adds [Repeated ## times] to the log.

It would also be great to to look at coverage link #24914 (comment) and list places where we should try to add missing test coverage.

@maflcko
Copy link
Member

maflcko commented Jun 27, 2023

If it turns out there are this cases where this repeats too many errors

I think that'd be the case on current master as well. My comment was that this pull may repeat the first error message, even though there may be a second one.

@ryanofsky
Copy link
Contributor

I think that'd be the case on current master as well. My comment was that this pull may repeat the first error message, even though there may be a second one.

Makes sense, I missed that. I thought the code was only trying to print one error message, and didn't notice it would print it multiple times. I like your fix of just printing all the messages instead of printing one, but either way to fix this seems fine.

@furszy
Copy link
Member

furszy commented Jun 27, 2023

Also, it may be good to add tests for currently untested code: https://marcofalke.github.io/b-c-cov/total.coverage/src/wallet/wallet.cpp.gcov.html#2926

I actually have a PR planned to remove most of those lines. The string errors there are duplicated from the internal walletdb errors. We can just bubble up the error instead of swallowing it.

Also, in the unit tests, we usually call CWallet::LoadWallet instead of CWallet::Create, thus why some of those paths are untested (like the UNKNOWN_DESCRIPTOR one, which is tested here).

That being said, totally agree on adding much more coverage to the area.

achow101 and others added 9 commits June 27, 2023 11:00
Instead of loading legacy wallet records as we come across them when
iterating the database, load them explicitly.

Exception handling for these records changes to a per-record type basis,
rather than globally. This results in some records now failing with a
critical error rather than a non-critical one.
Instead of loading descriptor wallet records as we come across them when
iterating the database, loading them explicitly.

Exception handling for these records changes to a per-record type basis,
rather than globally. This results in some records now failing with a
critical error rather than a non-critical one.
Instead of loading address book records as we come across them when
iterating the database, load them explicitly

Due to exception handling changes, deserialization errors are now
treated as critical.

The error message for noncritical errors has also been updated to
reflect that there's more data that could be missing than just address
book entries and tx data.
Instead of loading tx records as we come across them when iterating the
database, load them explicitly.
Instead of loading active spkm records as we come across them when
iterating the database, load them explicitly.

Due to exception handling changes, deserialization errors are now
treated as critical.
Instead of dealing with these records when iterating the entire
database, find and handle them explicitly.

Loading of OLD_KEY records is bumped up to a LOAD_FAIL error as we will
not be able to use these types of keys which can lead to users missing
funds.
Instead of loading decryption keys as we iterate the database, load them
explicitly.
Instead of iterating the database to load the wallet, we now load
particular kinds of records in an order that we want them to be loaded.
So it is no longer necessary to iterate the entire database to load the
wallet.
Co-Authored-By: Ryan Ofsky <ryan@ofsky.org>
@achow101 achow101 force-pushed the wallet-load-order branch from 7cb7740 to 3c83b1d Compare June 27, 2023 15:08
@maflcko
Copy link
Member

maflcko commented Jun 27, 2023

re-ACK 3c83b1d 🍤

Show signature

Signature:

untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: re-ACK 3c83b1d884b419adece95b335b6e956e7459a7ef 🍤
LjPNOXHR9a4Og46x/6TZNAoki0QrUodvQbWJAyOV1qwc7WRRNkI99Gn13HSmcRTEqR3pSz6zwG+lAATKwQF8CA==

@DrahtBot DrahtBot requested review from furszy and ryanofsky June 27, 2023 15:19
Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reACK 3c83b1d

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 3c83b1d. Just Marco's suggested error handling fixes since last review

@ryanofsky ryanofsky merged commit d9c7c2f into bitcoin:master Jun 27, 2023
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jun 30, 2023
Copy link

@Thompson1985 Thompson1985 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Veiwed

@bitcoin bitcoin locked and limited conversation to collaborators Oct 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.