Skip to content

Conversation

maflcko
Copy link
Member

@maflcko maflcko commented Jun 7, 2022

This was done in the context of #25284 , but I think it also makes sense standalone.

The basic idea is that serialization type should not be initialized when it is not needed. Same for the serialization version.

So do this here for DataStream. CDataStream remains in places where it is not yet possible.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 8, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK stickies-v, aureleoules
Concept ACK Empact

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #26715 (Introduce MockableDatabase for wallet unit tests by achow101)
  • #26705 (clang-tidy: Check headers and fixes them by hebasto)
  • #25297 (wallet: group independent db writes on single batched db transaction by furszy)
  • #24914 (wallet: Load database records in a particular order by achow101)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@fanquake
Copy link
Member

@sipa @laanwj @Empact you might be interested in reiviewing this given 25285 & 25331.

@Empact
Copy link
Contributor

Empact commented Oct 12, 2022

Concept ACK, also curious about FromMany, seems extraneous.

@maflcko
Copy link
Member Author

maflcko commented Nov 23, 2022

Thanks, removed unused method for now (will be re-added later when it is needed).

Copy link
Contributor

@aureleoules aureleoules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK fa04cd9

Copy link
Contributor

@aureleoules aureleoules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reACK fab2cec

The moved parts can be reviewed with "--color-moved=dimmed-zebra".
The one-char changes can be reviewed with "--word-diff-regex=.".
Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

CDataStream operates on bytes, but is agnostic to which kind of data is buffered. As such, the constructor takes a serialization type and version, to allow the consumer of the buffer to properly (de)serialize the data (e.g. as used in CAddress). It is not used by CDataStream directly, so it's more like metadata. This data seems unnecessary for a lot of the callsites, so this PR allows us to construct a DataStream without serialization info, while keeping a CDataStream with serialization info. All callsites that do not use serialization info should now be switched over to DataStream.

Out of curiosity: is there intent to phase out CDataStream entirely (through further refactoring)?

@maflcko
Copy link
Member Author

maflcko commented Jan 24, 2023

Yes, see OP. Though, the changes here are stand-alone.

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK fa0e664

  • The new DataStream class (and CDataStream subclass) looks quite straightforward when looking at the commit with --color-moved=dimmed_zebra.
  • I didn't manually verify the changed instances of CDataStream->DataStream (yet), as there are quite a few. I think it's reasonable enough to assume that if it compiles, it means the GetType() and GetVersion() functions are never called on that instance and it can safely be a DataStream. The main exception I see to that assumption is if the affected code is in a conditionally compiled block, such as e.g. in sighash_tests.cpp, in which case it may only fail with certain configurations.
    • Any other issues with that approach? I'm curious how other reviewers assessed this.
  • I think I've found some additional instances where we could be using a DataStream, as per the below diff:
git diff on fa0e664
diff --git a/src/qt/psbtoperationsdialog.cpp b/src/qt/psbtoperationsdialog.cpp
index 17a65be0f..c8373d3d9 100644
--- a/src/qt/psbtoperationsdialog.cpp
+++ b/src/qt/psbtoperationsdialog.cpp
@@ -129,14 +129,14 @@ void PSBTOperationsDialog::broadcastTransaction()
 }
 
 void PSBTOperationsDialog::copyToClipboard() {
-    CDataStream ssTx(SER_NETWORK, PROTOCOL_VERSION);
+    DataStream ssTx{};
     ssTx << m_transaction_data;
     GUIUtil::setClipboard(EncodeBase64(ssTx.str()).c_str());
     showStatus(tr("PSBT copied to clipboard."), StatusLevel::INFO);
 }
 
 void PSBTOperationsDialog::saveTransaction() {
-    CDataStream ssTx(SER_NETWORK, PROTOCOL_VERSION);
+    DataStream ssTx{};
     ssTx << m_transaction_data;
 
     QString selected_filter;
diff --git a/src/qt/recentrequeststablemodel.cpp b/src/qt/recentrequeststablemodel.cpp
index 85ade624c..bf55399d5 100644
--- a/src/qt/recentrequeststablemodel.cpp
+++ b/src/qt/recentrequeststablemodel.cpp
@@ -175,7 +175,7 @@ void RecentRequestsTableModel::addNewRequest(const SendCoinsRecipient &recipient
     newEntry.date = QDateTime::currentDateTime();
     newEntry.recipient = recipient;
 
-    CDataStream ss(SER_DISK, CLIENT_VERSION);
+    DataStream ss{};
     ss << newEntry;
 
     if (!walletModel->wallet().setAddressReceiveRequest(DecodeDestination(recipient.address.toStdString()), ToString(newEntry.id), ss.str()))
@@ -188,7 +188,7 @@ void RecentRequestsTableModel::addNewRequest(const SendCoinsRecipient &recipient
 void RecentRequestsTableModel::addNewRequest(const std::string &recipient)
 {
     std::vector<uint8_t> data(recipient.begin(), recipient.end());
-    CDataStream ss(data, SER_DISK, CLIENT_VERSION);
+    DataStream ss{data};
 
     RecentRequestEntry entry;
     ss >> entry;
diff --git a/src/qt/sendcoinsdialog.cpp b/src/qt/sendcoinsdialog.cpp
index 1604cad50..f9c3b0189 100644
--- a/src/qt/sendcoinsdialog.cpp
+++ b/src/qt/sendcoinsdialog.cpp
@@ -399,7 +399,7 @@ bool SendCoinsDialog::PrepareSendText(QString& question_string, QString& informa
 void SendCoinsDialog::presentPSBT(PartiallySignedTransaction& psbtx)
 {
     // Serialize the PSBT
-    CDataStream ssTx(SER_NETWORK, PROTOCOL_VERSION);
+    DataStream ssTx{};
     ssTx << psbtx;
     GUIUtil::setClipboard(EncodeBase64(ssTx.str()).c_str());
     QMessageBox msgBox;
diff --git a/src/qt/walletmodel.cpp b/src/qt/walletmodel.cpp
index cb8491e27..f592db066 100644
--- a/src/qt/walletmodel.cpp
+++ b/src/qt/walletmodel.cpp
@@ -261,7 +261,7 @@ void WalletModel::sendCoins(WalletModelTransaction& transaction)
         auto& newTx = transaction.getWtx();
         wallet().commitTransaction(newTx, /*value_map=*/{}, std::move(vOrderForm));
 
-        CDataStream ssTx(SER_NETWORK, PROTOCOL_VERSION);
+        DataStream ssTx{};
         ssTx << *newTx;
         transaction_array.append((const char*)ssTx.data(), ssTx.size());
     }
@@ -552,7 +552,7 @@ bool WalletModel::bumpFee(uint256 hash, uint256& new_hash)
             return false;
         }
         // Serialize the PSBT
-        CDataStream ssTx(SER_NETWORK, PROTOCOL_VERSION);
+        DataStream ssTx{};
         ssTx << psbtx;
         GUIUtil::setClipboard(EncodeBase64(ssTx.str()).c_str());
         Q_EMIT message(tr("PSBT copied"), "Copied to clipboard", CClientUIInterface::MSG_INFORMATION);
diff --git a/src/wallet/test/walletdb_tests.cpp b/src/wallet/test/walletdb_tests.cpp
index 21842fe78..6f75635bf 100644
--- a/src/wallet/test/walletdb_tests.cpp
+++ b/src/wallet/test/walletdb_tests.cpp
@@ -15,14 +15,14 @@ BOOST_FIXTURE_TEST_SUITE(walletdb_tests, BasicTestingSetup)
 BOOST_AUTO_TEST_CASE(walletdb_readkeyvalue)
 {
     /**
-     * When ReadKeyValue() reads from either a "key" or "wkey" it first reads the CDataStream steam into a
+     * When ReadKeyValue() reads from either a "key" or "wkey" it first reads the DataStream steam into a
      * CPrivKey or CWalletKey respectively and then reads a hash of the pubkey and privkey into a uint256.
      * Wallets from 0.8 or before do not store the pubkey/privkey hash, trying to read the hash from old
      * wallets throws an exception, for backwards compatibility this read is wrapped in a try block to
      * silently fail. The test here makes sure the type of exception thrown from CDataStream::read()
      * matches the type we expect, otherwise we need to update the "key"/"wkey" exception type caught.
      */
-    CDataStream ssValue(SER_DISK, CLIENT_VERSION);
+    DataStream ssValue{};
     uint256 dummy;
     BOOST_CHECK_THROW(ssValue >> dummy, std::ios_base::failure);
 }
diff --git a/src/wallet/wallet.cpp b/src/wallet/wallet.cpp
index c3756f89a..e47a69618 100644
--- a/src/wallet/wallet.cpp
+++ b/src/wallet/wallet.cpp
@@ -3821,8 +3821,8 @@ bool CWallet::MigrateToSQLite(bilingual_str& error)
     bool began = batch->TxnBegin();
     assert(began); // This is a critical error, the new db could not be written to. The original db exists as a backup, but we should not continue execution.
     for (const auto& [key, value] : records) {
-        CDataStream ss_key(key, SER_DISK, CLIENT_VERSION);
-        CDataStream ss_value(value, SER_DISK, CLIENT_VERSION);
+        DataStream ss_key{key};
+        DataStream ss_value{value};
         if (!batch->Write(ss_key, ss_value)) {
             batch->TxnAbort();
             m_database->Close();

Copy link
Contributor

@aureleoules aureleoules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reACK fa0e664

@maflcko
Copy link
Member Author

maflcko commented Jan 25, 2023

I think I've found some additional instances where we could be using a DataStream, as per the below diff:

Does it compile?

@maflcko
Copy link
Member Author

maflcko commented Jan 25, 2023

Force pushed to take the ones that compile, thanks

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK faceeec

I didn't verify that in fa4bc90 all the changed CDataStream->DataStream types are correct because that would be a very laborous task. I did verify that if they are incorrect, they should fail at compile-time, and not at run-time. I think that's a reasonable enough review approach.


Does it compile?

Sorry, I compiled everything but I forgot to reset my ./configure so some incorrectly slipped through, as you figured out.

Tweaked my script a bit, found some more instances - diff below. The one in walletdb_tests.cpp was included in my previous diff too but not yet updated, I'm not sure if that failed for you or just missed it?

Going to stop looking for more instances now, can always add them later on.

git diff
diff --git a/src/rpc/blockchain.cpp b/src/rpc/blockchain.cpp
index 1b2543c77..9e0bc312b 100644
--- a/src/rpc/blockchain.cpp
+++ b/src/rpc/blockchain.cpp
@@ -565,7 +565,7 @@ static RPCHelpMan getblockheader()
 
     if (!fVerbose)
     {
-        CDataStream ssBlock(SER_NETWORK, PROTOCOL_VERSION);
+        DataStream ssBlock{};
         ssBlock << pblockindex->GetBlockHeader();
         std::string strHex = HexStr(ssBlock);
         return strHex;
diff --git a/src/test/blockencodings_tests.cpp b/src/test/blockencodings_tests.cpp
index e1dafc6ba..e23b7228e 100644
--- a/src/test/blockencodings_tests.cpp
+++ b/src/test/blockencodings_tests.cpp
@@ -310,7 +310,7 @@ BOOST_AUTO_TEST_CASE(TransactionsRequestSerializationTest) {
     req1.indexes[2] = 3;
     req1.indexes[3] = 4;
 
-    CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
+    DataStream stream{};
     stream << req1;
 
     BlockTransactionsRequest req2;
@@ -330,7 +330,7 @@ BOOST_AUTO_TEST_CASE(TransactionsRequestDeserializationMaxTest) {
     req0.blockhash = InsecureRand256();
     req0.indexes.resize(1);
     req0.indexes[0] = 0xffff;
-    CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
+    DataStream stream{};
     stream << req0;
 
     BlockTransactionsRequest req1;
@@ -350,7 +350,7 @@ BOOST_AUTO_TEST_CASE(TransactionsRequestDeserializationOverflowTest) {
     req0.indexes[0] = 0x7000;
     req0.indexes[1] = 0x10000 - 0x7000 - 2;
     req0.indexes[2] = 0;
-    CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
+    DataStream stream{};
     stream << req0.blockhash;
     WriteCompactSize(stream, req0.indexes.size());
     WriteCompactSize(stream, req0.indexes[0]);
diff --git a/src/wallet/test/walletdb_tests.cpp b/src/wallet/test/walletdb_tests.cpp
index 21842fe78..7282aced6 100644
--- a/src/wallet/test/walletdb_tests.cpp
+++ b/src/wallet/test/walletdb_tests.cpp
@@ -15,14 +15,14 @@ BOOST_FIXTURE_TEST_SUITE(walletdb_tests, BasicTestingSetup)
 BOOST_AUTO_TEST_CASE(walletdb_readkeyvalue)
 {
     /**
-     * When ReadKeyValue() reads from either a "key" or "wkey" it first reads the CDataStream steam into a
+     * When ReadKeyValue() reads from either a "key" or "wkey" it first reads the DataStream steam into a
      * CPrivKey or CWalletKey respectively and then reads a hash of the pubkey and privkey into a uint256.
      * Wallets from 0.8 or before do not store the pubkey/privkey hash, trying to read the hash from old
      * wallets throws an exception, for backwards compatibility this read is wrapped in a try block to
-     * silently fail. The test here makes sure the type of exception thrown from CDataStream::read()
+     * silently fail. The test here makes sure the type of exception thrown from DataStream::read()
      * matches the type we expect, otherwise we need to update the "key"/"wkey" exception type caught.
      */
-    CDataStream ssValue(SER_DISK, CLIENT_VERSION);
+    DataStream ssValue{};
     uint256 dummy;
     BOOST_CHECK_THROW(ssValue >> dummy, std::ios_base::failure);
 }
diff --git a/src/wallet/walletdb.cpp b/src/wallet/walletdb.cpp
index 5f3be8a6b..2cd35ae40 100644
--- a/src/wallet/walletdb.cpp
+++ b/src/wallet/walletdb.cpp
@@ -994,7 +994,7 @@ DBErrors WalletBatch::FindWalletTxHashes(std::vector<uint256>& tx_hashes)
         {
             // Read next record
             DataStream ssKey{};
-            CDataStream ssValue(SER_DISK, CLIENT_VERSION);
+            DataStream ssValue{};
             DatabaseCursor::Status status = cursor->Next(ssKey, ssValue);
             if (status == DatabaseCursor::Status::DONE) {
                 break;

FYI, this is my (rather messy) script I used to find more candidates. It's a pretty dumb approach and only replaces one location at a time, so any case where e.g. both .cpp and .h file need to be updated simultaneously, this will not find.

python script to find `DataStream` candidates
"""
Run this script from bitcoin root dir, observe datastream.log for results (summarized at the end)
"""

import subprocess
from pathlib import Path
import re
from typing import Optional
from pprint import pformat

BASE_PATH = Path().absolute()
LOG_FILE = Path(BASE_PATH / "datastream.log")

def make() -> int:
    """return 0 if make completed without errors"""
    status = subprocess.run("make -j 9", shell=True).returncode
    return status
    
def update_line(line: str) -> Optional[str]:
    """return line with suggested DataStream replacement, or None if no CDataStream was found"""
    match = re.search(r'CDataStream\s*(\w+)?\s*[\(\{](.*)[\}\)]', line)
    if match:
        name = match.group(1) or ""
        params = [p.strip() for p in (match.group(2) or "").split(",")]
        if len(params) in (1, 3):
            # 1 param: passing a span; 3 param: passing a span and 2 serialization parameters
            # -> we just want to capture the span
            data = params[0]
        else:
            # 2 param: passing serialization -> initialize DataStream without arguments
            # >3 param -> ignore
            data = ""
        new_line = line.replace(match.group(0), f"DataStream {name}{{{data}}}")
        return new_line
    else:
        return None
        
def clear_log():
    with open(LOG_FILE, 'w') as f:
        f.close()
        
def log(msg: str):
    print(msg)
    with open(LOG_FILE, 'a') as f:
        f.write(msg + '\n')

def main():
    success = {}
    failed = {}
    
    subprocess.run("git checkout .", shell=True)  # start from a clean slate in case changes from previous run persisted
    clear_log()
    
    for ext in ("cpp", "h"):
        for item in Path(BASE_PATH / "src").rglob(f"*.{ext}"):
            if item.is_dir():
                continue
            replaced = False
            log(f"scanning file {item}")
            with open(item, "r") as read_file:
                data = read_file.readlines()
                for i, original_line in enumerate(data):
                    location = f"{item}:{i}"
                    new_line = update_line(original_line)
                    if new_line:
                        data[i] = new_line
                        log(f"{location}: replaced {original_line}->{new_line}")
                        with open(item, "w") as write_file:
                            write_file.writelines(data)
                        if make() == 0:
                            success[location] = new_line
                            log(f"{location}: success | {new_line}")
                        else:
                            failed[location] = new_line
                            log(f"{location}: failed")
                        with open(item, "w") as write_file:
                            data[i] = original_line
                            write_file.writelines(data)    
    log("FINISHED")
    log("========")    
    log("success: \n" + pformat(success))
    
if __name__ == '__main__':
    main()```

@maflcko
Copy link
Member Author

maflcko commented Jan 26, 2023

The one in walletdb_tests.cpp was included in my previous diff too but not yet updated, I'm not sure if that failed for you or just missed it?

It doesn't matter, but personally I think it would be best if the unit test used the exact same type as the "real" ReadKeyValue function uses. Happy to change, though.

The last use was removed in the previous commit.
@maflcko
Copy link
Member Author

maflcko commented Jan 26, 2023

Force pushed to take more, thanks!

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK fa035fe

I think it would be best if the unit test used the exact same type as the "real" ReadKeyValue function uses

Makes sense, and no need to rush with the migration.

Copy link
Contributor

@aureleoules aureleoules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fanquake added a commit to bitcoin-core/gui that referenced this pull request Jan 26, 2023
…version and use it where possible

fa035fe Remove unused CDataStream::SetType (MarcoFalke)
fa29e73 Use DataStream where possible (MarcoFalke)
fa9becf streams: Add DataStream without ser-type and ser-version (MarcoFalke)

Pull request description:

  This was done in the context of bitcoin/bitcoin#25284 , but I think it also makes sense standalone.

  The basic idea is that serialization type should not be initialized when it is not needed. Same for the serialization version.

  So do this here for `DataStream`. `CDataStream` remains in places where it is not yet possible.

ACKs for top commit:
  stickies-v:
    re-ACK [fa035fe](bitcoin/bitcoin@fa035fe)
  aureleoules:
    diff re-ACK fa035fe https://github.com/bitcoin/bitcoin/compare/fa0e6640bac8c6426af7c5744125c85c0f74b9e5..fa035fe2d61d0c98d1bfd0153a0c3b5eb9d40de4

Tree-SHA512: cb5e53d0df7c94319ffadc6ea1d887fc38516decaf43f0673396d79cc62d450a1a61173654a91b8c2b52d2cecea53fe4a500b8f6466596f35731471163fb051c
@fanquake
Copy link
Member

This has been merged.

@fanquake fanquake closed this Jan 26, 2023
@maflcko maflcko deleted the 2206-datastream-🎑 branch January 26, 2023 15:07
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jan 26, 2023
@bitcoin bitcoin locked and limited conversation to collaborators Jan 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants