Skip to content

Conversation

fjahr
Copy link
Contributor

@fjahr fjahr commented Mar 17, 2024

The function GetFirstStoredBlock() helps us find the first block for which we have data. So far this function only looked for a block with BLOCK_HAVE_DATA. However, this doesn't mean that we also have the undo data of that block, and undo data might be required for what a user would like to do with those blocks. One example of how this might happen is if some blocks were fetched using the getblockfrompeer RPC. Blocks fetched from a peer will have data but no undo data.

The first commit here allows GetFirstStoredBlock() to check for undo data as well by passing a parameter. This alone is useful for #29553 and I would use it there.

In the second commit I am applying the undo check to the RPCs that report pruneheight to the user. I find this much more intuitive because I think the user expects to be able to do all operations on blocks up until the pruneheight but that is not the case if undo data is missing. I personally ran into this once before and now again when testing for assumeutxo when I had used getblockfrompeer. The following commit adds test coverage for this change of behavior.

The last commit adds a note in the docs of getblockfrompeer that undo data will not be available.

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 17, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK furszy, stickies-v, achow101
Stale ACK TheCharlatan

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29770 (index: Check all necessary block data is available before starting to sync by fjahr)
  • #29656 (chainparams: Change nChainTx type to uint64_t by fjahr)
  • #19463 (Prune locks by luke-jr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@fjahr fjahr changed the title prune, rpc: Check undo data when finding for pruneheight prune, rpc: Check undo data when finding pruneheight Mar 17, 2024
@fjahr fjahr force-pushed the 2024-03-first-stored-undo branch from 892d6fc to 296243c Compare March 17, 2024 20:13
Copy link
Contributor

@TheCharlatan TheCharlatan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 296243c

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

I think this approach works well, but I think I'd prefer passing a mask instead of a bool, which prevents combinatorial explosion of params if we want to add more filtering options in the future. I personally find that a bit easier to quickly understand too. What do you think about something like this (didn't add static asserts and doc updates etc yet):

git diff on 296243c
diff --git a/src/node/blockstorage.cpp b/src/node/blockstorage.cpp
index 74a233f43b..eaefa602d3 100644
--- a/src/node/blockstorage.cpp
+++ b/src/node/blockstorage.cpp
@@ -595,16 +595,12 @@ bool BlockManager::IsBlockPruned(const CBlockIndex& block)
     return m_have_pruned && !(block.nStatus & BLOCK_HAVE_DATA) && (block.nTx > 0);
 }
 
-const CBlockIndex* BlockManager::GetFirstStoredBlock(const CBlockIndex& upper_block, const CBlockIndex* lower_block, const bool check_undo)
+const CBlockIndex* BlockManager::GetFirstStoredBlock(const CBlockIndex& upper_block, const CBlockIndex* lower_block, uint32_t status_mask)
 {
     AssertLockHeld(::cs_main);
-    uint32_t check_status{BLOCK_HAVE_DATA};
-    if (check_undo) {
-        check_status |= BLOCK_HAVE_UNDO;
-    }
     const CBlockIndex* last_block = &upper_block;
-    assert((last_block->nStatus & check_status) == check_status); // 'upper_block' must have data
-    while (last_block->pprev && ((last_block->pprev->nStatus & check_status) == check_status)) {
+    assert((last_block->nStatus & status_mask) == status_mask); // 'upper_block' must have data
+    while (last_block->pprev && ((last_block->pprev->nStatus & status_mask) == status_mask)) {
         if (lower_block) {
             // Return if we reached the lower_block
             if (last_block == lower_block) return lower_block;
@@ -617,7 +613,7 @@ const CBlockIndex* BlockManager::GetFirstStoredBlock(const CBlockIndex& upper_bl
     assert(last_block != nullptr);
     // In the special case that all blocks up to the Genesis block are not
     // pruned, we return the Genesis block instead of block 1, even though
-    // it can not have any undo data, since it is properly handled as an
+    // it may be missing (e.g. undo) data, since it is properly handled as an
     // exception everywhere.
     if (last_block->nHeight == 1) {
         return last_block->pprev;
diff --git a/src/node/blockstorage.h b/src/node/blockstorage.h
index 33d78259e6..766a170946 100644
--- a/src/node/blockstorage.h
+++ b/src/node/blockstorage.h
@@ -338,7 +338,7 @@ public:
     //! Find the first stored ancestor of start_block immediately after the last
     //! pruned ancestor. Return value will never be null. Caller is responsible
     //! for ensuring that start_block has data.
-    const CBlockIndex* GetFirstStoredBlock(const CBlockIndex& start_block LIFETIMEBOUND, const CBlockIndex* lower_block=nullptr, const bool check_undo=false) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
+    const CBlockIndex* GetFirstStoredBlock(const CBlockIndex& start_block LIFETIMEBOUND, const CBlockIndex* lower_block=nullptr, uint32_t status_mask=BLOCK_HAVE_DATA) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
 
     /** True if any block files have ever been pruned. */
     bool m_have_pruned = false;
diff --git a/src/rpc/blockchain.cpp b/src/rpc/blockchain.cpp
index 3cd9421343..88ed7ed964 100644
--- a/src/rpc/blockchain.cpp
+++ b/src/rpc/blockchain.cpp
@@ -836,7 +836,8 @@ static RPCHelpMan pruneblockchain()
 
     PruneBlockFilesManual(active_chainstate, height);
     const CBlockIndex& block{*CHECK_NONFATAL(active_chain.Tip())};
-    return block.nStatus & BLOCK_HAVE_MASK ? active_chainstate.m_blockman.GetFirstStoredBlock(block, nullptr, true)->nHeight - 1 : block.nHeight;
+    const uint32_t has_undo{BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO};
+    return block.nStatus & BLOCK_HAVE_MASK ? active_chainstate.m_blockman.GetFirstStoredBlock(block, nullptr, has_undo)->nHeight - 1 : block.nHeight;
 },
     };
 }
@@ -1290,7 +1291,8 @@ RPCHelpMan getblockchaininfo()
     obj.pushKV("pruned", chainman.m_blockman.IsPruneMode());
     if (chainman.m_blockman.IsPruneMode()) {
         bool has_tip_data = tip.nStatus & BLOCK_HAVE_MASK;
-        obj.pushKV("pruneheight", has_tip_data ? chainman.m_blockman.GetFirstStoredBlock(tip, nullptr, true)->nHeight : tip.nHeight + 1);
+        const uint32_t has_undo{BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO};
+        obj.pushKV("pruneheight", has_tip_data ? chainman.m_blockman.GetFirstStoredBlock(tip, nullptr, has_undo)->nHeight : tip.nHeight + 1);
 
         const bool automatic_pruning{chainman.m_blockman.GetPruneTarget() != BlockManager::PRUNE_TARGET_MANUAL};
         obj.pushKV("automatic_pruning",  automatic_pruning);

Code LGTM 296243c, and I verified that the (updated) feature_pruning.py test fails on master as expected. I'm happy to proceed with your approach too if you don't like mine.

@fjahr fjahr force-pushed the 2024-03-first-stored-undo branch from 296243c to bc36604 Compare March 27, 2024 17:01
@fjahr
Copy link
Contributor Author

fjahr commented Mar 27, 2024

@stickies-v thanks for the feedback! I don't have strong feelings between the approaches but I agree yours is more flexible and maybe a bit more expressive, so I applied that. I also addressed the rest of the feedback and added some more detail to the docs.

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/23163769501

@fjahr fjahr force-pushed the 2024-03-first-stored-undo branch from bc36604 to 23ac561 Compare March 28, 2024 13:50
Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: doesn't this mean that if we generate a chain, prune half of it, then fetch the pruned blocks through getblockfrompeer, any fresh indexes sync will throw a different init error after this changes? (StartIndexBackgroundSync() should fail at CheckBlockDataAvailability() now).

If this is correct, it would be nice to introduce a test presenting the behavior change.

@fjahr
Copy link
Contributor Author

fjahr commented Mar 30, 2024

q: doesn't this mean that if we generate a chain, prune half of it, then fetch the pruned blocks through getblockfrompeer, any fresh indexes sync will throw a different init error after this changes? (StartIndexBackgroundSync() should fail at CheckBlockDataAvailability() now).

If this is correct, it would be nice to introduce a test presenting the behavior change.

Good question but no, CheckBlockDataAvailability() does not change the default mask (HAVE_DATA), so it shouldn't change behavior there for now. But your question made me realize that we should check for undo data for some indices because they need it. I have implemented that in a follow-up here: #29770 and I also implemented the test there that shows the behavior change you had in mind I think.

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for incorporating the mask suggestion. Had a deeper look at the code, I think this logic maybe warrants a bit more cleaning up, because with the status_mask, using GetFirstStoredBlock is not very intuitive anymore and potentially a bit footgunny.

  • GetFirstStoredBlock: rename to GetFirstBlock since the Stored bit depends on the status_mask now. Consequently, make status_mask required to avoid footguns.
  • let callsites handle genesis block exceptions, because it depends on the status_mask. As a bonus, everything is a bit more explicit?
  • some other tidying up, e.g. make parameter naming consistent between header and implementation, improve documentation

Wdyt about this further iteration? I understand it's quite a bit of overhaul compared to your first proposal for what should be a relatively simple change. I'm just hesitant about introducing new boolean parameters when we can avoid it, and I think it's worth improving overall code robustness?

git diff on 23ac561
diff --git a/src/init.cpp b/src/init.cpp
index 1a4fce4678..a8a5ff0dfd 100644
--- a/src/init.cpp
+++ b/src/init.cpp
@@ -2032,7 +2032,7 @@ bool StartIndexBackgroundSync(NodeContext& node)
     if (indexes_start_block) {
         LOCK(::cs_main);
         const CBlockIndex* start_block = *indexes_start_block;
-        if (!start_block) start_block = chainman.ActiveChain().Genesis();
+        if (!start_block) start_block = chainman.ActiveChain()[1];
         if (!chainman.m_blockman.CheckBlockDataAvailability(*index_chain.Tip(), *Assert(start_block))) {
             return InitError(strprintf(Untranslated("%s best block of the index goes beyond pruned data. Please disable the index or reindex (which will download the whole blockchain again)"), older_index_name));
         }
diff --git a/src/node/blockstorage.cpp b/src/node/blockstorage.cpp
index eaefa602d3..8556504453 100644
--- a/src/node/blockstorage.cpp
+++ b/src/node/blockstorage.cpp
@@ -595,11 +595,12 @@ bool BlockManager::IsBlockPruned(const CBlockIndex& block)
     return m_have_pruned && !(block.nStatus & BLOCK_HAVE_DATA) && (block.nTx > 0);
 }
 
-const CBlockIndex* BlockManager::GetFirstStoredBlock(const CBlockIndex& upper_block, const CBlockIndex* lower_block, uint32_t status_mask)
+const CBlockIndex* BlockManager::GetFirstBlock(const CBlockIndex& upper_block, uint32_t status_mask,
+                                               const CBlockIndex* lower_block)
 {
     AssertLockHeld(::cs_main);
     const CBlockIndex* last_block = &upper_block;
-    assert((last_block->nStatus & status_mask) == status_mask); // 'upper_block' must have data
+    assert((last_block->nStatus & status_mask) == status_mask); // 'upper_block' must satisfy the status mask
     while (last_block->pprev && ((last_block->pprev->nStatus & status_mask) == status_mask)) {
         if (lower_block) {
             // Return if we reached the lower_block
@@ -610,21 +611,13 @@ const CBlockIndex* BlockManager::GetFirstStoredBlock(const CBlockIndex& upper_bl
         }
         last_block = last_block->pprev;
     }
-    assert(last_block != nullptr);
-    // In the special case that all blocks up to the Genesis block are not
-    // pruned, we return the Genesis block instead of block 1, even though
-    // it may be missing (e.g. undo) data, since it is properly handled as an
-    // exception everywhere.
-    if (last_block->nHeight == 1) {
-        return last_block->pprev;
-    }
-    return last_block;
+    return Assert(last_block);
 }
 
 bool BlockManager::CheckBlockDataAvailability(const CBlockIndex& upper_block, const CBlockIndex& lower_block)
 {
     if (!(upper_block.nStatus & BLOCK_HAVE_DATA)) return false;
-    return GetFirstStoredBlock(upper_block, &lower_block) == &lower_block;
+    return GetFirstBlock(upper_block, BLOCK_HAVE_DATA, &lower_block) == &lower_block;
 }
 
 // If we're using -prune with -reindex, then delete block files that will be ignored by the
diff --git a/src/node/blockstorage.h b/src/node/blockstorage.h
index 8f83b68558..8f309c31b6 100644
--- a/src/node/blockstorage.h
+++ b/src/node/blockstorage.h
@@ -335,13 +335,29 @@ public:
     //! (part of the same chain).
     bool CheckBlockDataAvailability(const CBlockIndex& upper_block LIFETIMEBOUND, const CBlockIndex& lower_block LIFETIMEBOUND) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
 
-    //! Find the first stored ancestor of start_block immediately after the last
-    //! ancestor that does not match the status mask. Return value will never be
-    //! null. Caller is responsible for ensuring that start_block has the status
-    //! mask data. If the whole chain matched the status mask the genesis block
-    //! will be returned regardless of it's status match because, for example,
-    //! it can not have undo data by nature.
-    const CBlockIndex* GetFirstStoredBlock(const CBlockIndex& start_block LIFETIMEBOUND, const CBlockIndex* lower_block=nullptr, uint32_t status_mask=BLOCK_HAVE_DATA) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
+    /**
+     * @brief Finds the furthest away ancestor of `upper_block` of which the nStatus matches the `status_mask`
+     * 
+     * Starts from `upper_block` and iterates backwards through its ancestors for as long as the block's
+     * nStatus keeps matching the `status_mask` mask, or until it encounters `lower_block`.
+     *
+     * @pre `upper_block` must match the `status_mask`.
+     * 
+     * @param upper_block The block from which to start the search, must meet the `status_mask` criteria.
+     * @param status_mask A bitmask representing the conditions to check against each block's nStatus.
+     * @param lower_block An optional pointer to the `CBlockIndex` that serves as the lower boundary of
+     *                    the search (inclusive). If `nullptr`, the search includes up to the genesis
+     *                    block.
+     * @return A (non-null) pointer to the `CBlockIndex` of the furthest away ancestor (including
+     *         `upper_block` itself) that matches the `status_mask`, up to and including
+     *         `lower_block` if it is part of the ancestry.
+     */
+    const CBlockIndex* GetFirstBlock(
+        const CBlockIndex& upper_block LIFETIMEBOUND, 
+        uint32_t status_mask,
+        const CBlockIndex* lower_block = nullptr
+    ) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
+
 
     /** True if any block files have ever been pruned. */
     bool m_have_pruned = false;
diff --git a/src/rpc/blockchain.cpp b/src/rpc/blockchain.cpp
index fa83d2e397..15d6f9e9b9 100644
--- a/src/rpc/blockchain.cpp
+++ b/src/rpc/blockchain.cpp
@@ -94,6 +94,18 @@ double GetDifficulty(const CBlockIndex& blockindex)
     return dDiff;
 }
 
+//! Return height of highest block that has been pruned, or -1 if no blocks have been pruned
+static int GetPruneHeight(const Chainstate& active_chainstate) EXCLUSIVE_LOCKS_REQUIRED(!cs_main) {
+    AssertLockHeld(cs_main);
+
+    const CBlockIndex& chain_tip{*CHECK_NONFATAL(active_chainstate.m_chain.Tip())};
+    if (!(chain_tip.nStatus & BLOCK_HAVE_MASK)) return chain_tip.nHeight;
+    const auto first_pruned{active_chainstate.m_blockman.GetFirstBlock(chain_tip, BLOCK_HAVE_MASK)->nHeight - 1};
+
+    // special case: genesis block is never pruned
+    return first_pruned == 0 ? -1 : first_pruned;
+}
+
 static int ComputeNextBlockAndDepth(const CBlockIndex& tip, const CBlockIndex& blockindex, const CBlockIndex*& next)
 {
     next = tip.GetAncestor(blockindex.nHeight + 1);
@@ -835,9 +847,7 @@ static RPCHelpMan pruneblockchain()
     }
 
     PruneBlockFilesManual(active_chainstate, height);
-    const CBlockIndex& block{*CHECK_NONFATAL(active_chain.Tip())};
-    const uint32_t has_undo{BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO};
-    return block.nStatus & BLOCK_HAVE_MASK ? active_chainstate.m_blockman.GetFirstStoredBlock(block, /*lower_block=*/nullptr, /*status_mask=*/has_undo)->nHeight - 1 : block.nHeight;
+    return GetPruneHeight(active_chainstate);
 },
     };
 }
@@ -1290,10 +1300,7 @@ RPCHelpMan getblockchaininfo()
     obj.pushKV("size_on_disk", chainman.m_blockman.CalculateCurrentUsage());
     obj.pushKV("pruned", chainman.m_blockman.IsPruneMode());
     if (chainman.m_blockman.IsPruneMode()) {
-        bool has_tip_data = tip.nStatus & BLOCK_HAVE_DATA;
-        const uint32_t has_undo{BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO};
-        obj.pushKV("pruneheight", has_tip_data ? chainman.m_blockman.GetFirstStoredBlock(tip, /*lower_block=*/nullptr, /*status_mask=*/has_undo)->nHeight : tip.nHeight + 1);
-
+        obj.pushKV("pruneheight", GetPruneHeight(active_chainstate) + 1);
         const bool automatic_pruning{chainman.m_blockman.GetPruneTarget() != BlockManager::PRUNE_TARGET_MANUAL};
         obj.pushKV("automatic_pruning",  automatic_pruning);
         if (automatic_pruning) {
diff --git a/src/test/blockmanager_tests.cpp b/src/test/blockmanager_tests.cpp
index d7ac0bf823..12f3257700 100644
--- a/src/test/blockmanager_tests.cpp
+++ b/src/test/blockmanager_tests.cpp
@@ -2,6 +2,7 @@
 // Distributed under the MIT software license, see the accompanying
 // file COPYING or http://www.opensource.org/licenses/mit-license.php.
 
+#include <chain.h>
 #include <chainparams.h>
 #include <clientversion.h>
 #include <node/blockstorage.h>
@@ -113,7 +114,7 @@ BOOST_FIXTURE_TEST_CASE(blockmanager_block_data_availability, TestChain100Setup)
     };
 
     // 1) Return genesis block when all blocks are available
-    BOOST_CHECK_EQUAL(blockman.GetFirstStoredBlock(tip), chainman->ActiveChain()[0]);
+    BOOST_CHECK_EQUAL(blockman.GetFirstBlock(tip, BLOCK_HAVE_DATA), chainman->ActiveChain()[0]);
     BOOST_CHECK(blockman.CheckBlockDataAvailability(tip, *chainman->ActiveChain()[0]));
 
     // 2) Check lower_block when all blocks are available
@@ -127,7 +128,7 @@ BOOST_FIXTURE_TEST_CASE(blockmanager_block_data_availability, TestChain100Setup)
     func_prune_blocks(last_pruned_block);
 
     // 3) The last block not pruned is in-between upper-block and the genesis block
-    BOOST_CHECK_EQUAL(blockman.GetFirstStoredBlock(tip), first_available_block);
+    BOOST_CHECK_EQUAL(blockman.GetFirstBlock(tip, BLOCK_HAVE_DATA), first_available_block);
     BOOST_CHECK(blockman.CheckBlockDataAvailability(tip, *first_available_block));
     BOOST_CHECK(!blockman.CheckBlockDataAvailability(tip, *last_pruned_block));
 }

@fjahr fjahr force-pushed the 2024-03-first-stored-undo branch from 23ac561 to c77e847 Compare April 28, 2024 19:28
@fjahr
Copy link
Contributor Author

fjahr commented Apr 28, 2024

@stickies-v Thank you, I mostly took your suggestions with some minor tweaks and made you co-author of the refactoring commit. I was a bit torn about letting the caller handle the Genesis block issue but since this was actually the previous behavior I guess it's better to not change it if you like it the way it is. I have adapted GetPruneHeight to return std::optional which I found a good fit and avoids the -1 return value. And I think the change in init.cpp isn't needed here (yet). But I think we will need it in #29770 later.

EDIT: I had forgotten that pruneblockchain returns -1 when nothing is pruned, but I still think handling it internally with std::optional is nicer.

@fjahr fjahr force-pushed the 2024-03-first-stored-undo branch 3 times, most recently from 928141d to edbb6a3 Compare April 28, 2024 22:44
Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the last 3 commits should be squashed

return std::nullopt;
}
// The block before the block with all data is the last that was pruned
return first_block->pprev->nHeight;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetFirstBlock() does not provide any guarantees that pprev is not nullptr (nor should it), so I think we need to assert that here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is fully addressed in 0501ec1 - would update to return Assert(first_unpruned.pprev)->nHeight;?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this is still needed when we return above if !first_unpruned.pprev but I've added it since it won't hurt.

Comment on lines 1310 to 1318
const auto prune_height{GetPruneHeight(active_chainstate)};
obj.pushKV("pruneheight", prune_height ? prune_height.value() + 1 : 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: more consistent with the (suggested) approach in pruneblockchain, but then "plus one (only present if pruning is enabled)" as per the docs in this RPC

Suggested change
const auto prune_height{GetPruneHeight(active_chainstate)};
obj.pushKV("pruneheight", prune_height ? prune_height.value() + 1 : 0);
const auto prune_height{GetPruneHeight(active_chainstate).value_or(-1)};
obj.pushKV("pruneheight", prune_height + 1);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep this one as is because I find it more intuitive.

@DrahtBot DrahtBot mentioned this pull request Apr 29, 2024
@fjahr fjahr force-pushed the 2024-03-first-stored-undo branch from edbb6a3 to 697d0a2 Compare April 29, 2024 16:25
@fjahr
Copy link
Contributor Author

fjahr commented Apr 29, 2024

Adressed feedback from @stickies-v , thanks again!

nit: I think the last 3 commits should be squashed

The doc change is only tangentially related so I will keep it separate. But I squashed the 2nd and 3rd.

@@ -429,6 +429,7 @@ static RPCHelpMan getblockfrompeer()
"getblockfrompeer",
"Attempt to fetch block from a given peer.\n\n"
"We must have the header for this block, e.g. using submitheader.\n"
"The block will not have any undo data which can limit the usage of the block data in a context where the undo data is needed.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: is the "undo data" term well known outside of core? It seems to be an implementation detail to me. Maybe, could write this differently: "The block data will be stored alone; there will be no available information associated with the outputs this block spent. This limits the usage.. etc".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, not sure how much it is known but it is referenced in other places already, like the getblock RPC help. And I'm not sure if a more generic text helps more people because it's a quite technical reason and I don't know how to explain it in a way that works for everyone and is still concise. With the term undo data people at least have something they can search on Bitcoin SE etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to explain it in a way that works for everyone and is still concise

No problem on leaving it as is if there is other RPC command using this term.
What I wrote seems generally useful to me; it means that the node might not be able to access the spent outputs script and value. Maybe adding few scenarios where the undo data is needed could clarify it usage.

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 8789dc8.
Left few nits.

@@ -429,6 +429,7 @@ static RPCHelpMan getblockfrompeer()
"getblockfrompeer",
"Attempt to fetch block from a given peer.\n\n"
"We must have the header for this block, e.g. using submitheader.\n"
"The block will not have any undo data which can limit the usage of the block data in a context where the undo data is needed.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to explain it in a way that works for everyone and is still concise

No problem on leaving it as is if there is other RPC command using this term.
What I wrote seems generally useful to me; it means that the node might not be able to access the spent outputs script and value. Maybe adding few scenarios where the undo data is needed could clarify it usage.

Comment on lines +1317 to +1318
const auto prune_height{GetPruneHeight(chainman.m_blockman, active_chainstate.m_chain)};
obj.pushKV("pruneheight", prune_height ? prune_height.value() + 1 : 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

little topic:
maybe changing the RPC docs to say that "pruneheight" returns "the first block unpruned, all previous blocks were pruned" is more useful than saying "height of the last block pruned, plus one" as is now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, taken as you suggested in #29770

Comment on lines +801 to +802
const auto& first_unpruned{*Assert(blockman.GetFirstBlock(*chain_tip, /*status_mask=*/BLOCK_HAVE_MASK, first_block))};
if (&first_unpruned == first_block) {
Copy link
Member

@furszy furszy Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove the back and forth.

Suggested change
const auto& first_unpruned{*Assert(blockman.GetFirstBlock(*chain_tip, /*status_mask=*/BLOCK_HAVE_MASK, first_block))};
if (&first_unpruned == first_block) {
const CBlockIndex* first_unpruned{Assert(blockman.GetFirstBlock(*chain_tip, /*status_mask=*/BLOCK_HAVE_MASK, first_block))};
if (first_unpruned == first_block) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in #29770

if (!first_block || !chain_tip) return std::nullopt;

// If the chain tip is pruned, everything is pruned.
if (!((chain_tip->nStatus & BLOCK_HAVE_MASK) == BLOCK_HAVE_MASK)) return chain_tip->nHeight;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be simpler:

Suggested change
if (!((chain_tip->nStatus & BLOCK_HAVE_MASK) == BLOCK_HAVE_MASK)) return chain_tip->nHeight;
if ((chain_tip->nStatus & BLOCK_HAVE_MASK) != BLOCK_HAVE_MASK) return chain_tip->nHeight;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in #29770

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 8789dc8

@@ -57,4 +58,7 @@ UniValue CreateUTXOSnapshot(
const fs::path& path,
const fs::path& tmppath);

//! Return height of highest block that has been pruned, or std::nullopt if no blocks have been pruned
std::optional<int> GetPruneHeight(const node::BlockManager& blockman, const CChain& chain) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing some includes / fwd decls for the newly added line:

git diff on 8789dc8
diff --git a/src/rpc/blockchain.h b/src/rpc/blockchain.h
index f6a7fe236c..23909c334e 100644
--- a/src/rpc/blockchain.h
+++ b/src/rpc/blockchain.h
@@ -9,15 +9,18 @@
 #include <core_io.h>
 #include <streams.h>
 #include <sync.h>
+#include <threadsafety.h>
 #include <util/fs.h>
 #include <validation.h>
 
 #include <any>
+#include <optional>
 #include <stdint.h>
 #include <vector>
 
 class CBlock;
 class CBlockIndex;
+class CChain;
 class Chainstate;
 class UniValue;
 namespace node {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #29770

@@ -74,4 +76,36 @@ BOOST_AUTO_TEST_CASE(get_difficulty_for_very_high_target)
TestDifficulty(0x12345678, 5913134931067755359633408.0);
}

//! Prune chain from height down to genesis block and check that
//! GetPruneHeight returns the correct value
static void CheckGetPruneHeight(node::BlockManager& blockman, CChain& chain, int height) EXCLUSIVE_LOCKS_REQUIRED(::cs_main)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
static void CheckGetPruneHeight(node::BlockManager& blockman, CChain& chain, int height) EXCLUSIVE_LOCKS_REQUIRED(::cs_main)
static void CheckGetPruneHeight(const node::BlockManager& blockman, const CChain& chain, int height) EXCLUSIVE_LOCKS_REQUIRED(::cs_main)

and related:

git diff on 8789dc8
diff --git a/src/test/blockchain_tests.cpp b/src/test/blockchain_tests.cpp
index 2a63d09795..7a5d175c8c 100644
--- a/src/test/blockchain_tests.cpp
+++ b/src/test/blockchain_tests.cpp
@@ -96,8 +96,8 @@ static void CheckGetPruneHeight(node::BlockManager& blockman, CChain& chain, int
 BOOST_FIXTURE_TEST_CASE(get_prune_height, TestChain100Setup)
 {
     LOCK(::cs_main);
-    auto& chain = m_node.chainman->ActiveChain();
-    auto& blockman = m_node.chainman->m_blockman;
+    const auto& chain = m_node.chainman->ActiveChain();
+    const auto& blockman = m_node.chainman->m_blockman;
 
     // Fresh chain of 100 blocks without any pruned blocks, so std::nullopt should be returned
     BOOST_CHECK(!GetPruneHeight(blockman, chain).has_value());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #29770

@achow101
Copy link
Member

ACK 8789dc8

@achow101 achow101 merged commit f4849f6 into bitcoin:master Jul 10, 2024
@fjahr
Copy link
Contributor Author

fjahr commented Jul 10, 2024

With this merged, I will take #29770 out of draft mode shortly and also address the left-over comments here. Thanks @stickies-v and @furszy !

fjahr added a commit to fjahr/bitcoin that referenced this pull request Jul 12, 2024
fjahr added a commit to fjahr/bitcoin that referenced this pull request Jul 12, 2024
fjahr added a commit to fjahr/bitcoin that referenced this pull request Jul 15, 2024
fjahr added a commit to fjahr/bitcoin that referenced this pull request Jul 17, 2024
fjahr added a commit to fjahr/bitcoin that referenced this pull request Mar 29, 2025
fjahr added a commit to fjahr/bitcoin that referenced this pull request Mar 30, 2025
fjahr added a commit to fjahr/bitcoin that referenced this pull request Apr 3, 2025
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Apr 14, 2025
Summary:
> GetFirstStoredBlock is generalized to check for any data status with a
> status mask that needs to be passed as a parameter. To reflect this the
> function is also renamed to GetFirstBlock.
>
> Co-authored-by: stickies-v <stickies-v@protonmail.com>

In our codebase the status mask is a private member of `BlockStatus`, but we can achieve the same thing by passing a test function that takes a `BlockStatus` and returns a `bool`.

```
Core  ->  Bitcoin ABC
GetFirstBlock(upper_block, BLOCK_HAVE_DATA, &lower_block)
    -> GetFirstBlock(upper_block, [](BlockStatus status) {return status.hasData();}, &lower_block)
GetFirstBlock(upper_block, BLOCK_HAVE_MASK, &lower_block)
    -> GetFirstBlock(upper_block, [](BlockStatus status) {return status.hasData() && status.hasUndo();}, &lower_block)
```

This is a partial backport of [[bitcoin/bitcoin#29668 | core#29668]]
bitcoin/bitcoin@96b4fac

Test Plan: `ninja check-functional-extended`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D17924
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Apr 14, 2025
Summary:
From the PR description:
> The function GetFirstStoredBlock() helps us find the first block for which we have data. So far this function only looked for a block with BLOCK_HAVE_DATA. However, this doesn't mean that we also have the undo data of that block, and undo data might be required for what a user would like to do with those blocks. One example of how this might happen is if some blocks were fetched using the getblockfrompeer RPC. Blocks fetched from a peer will have data but no undo data.
>
> In the second commit I am applying the undo check to the RPCs that report pruneheight to the user. I find this much more intuitive because I think the user expects to be able to do all operations on blocks up until the pruneheight but that is not the case if undo data is missing. I personally ran into this once before and now again when testing for assumeutxo when I had used getblockfrompeer. The following commit adds test coverage for this change of behavior.

This is a partial backport of [[bitcoin/bitcoin#29668 | core#29668]]  and [[bitcoin/bitcoin#30429 | core#30429]]
bitcoin/bitcoin@4a19750
bitcoin/bitcoin@8789dc8
bitcoin/bitcoin@fa62707

Depends on D17924

Test Plan: `ninja all check-extended`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D17925
fjahr added a commit to fjahr/bitcoin that referenced this pull request May 31, 2025
fjahr added a commit to fjahr/bitcoin that referenced this pull request Jun 4, 2025
fjahr added a commit to fjahr/bitcoin that referenced this pull request Jun 15, 2025
fjahr added a commit to fjahr/bitcoin that referenced this pull request Jun 17, 2025
fjahr added a commit to fjahr/bitcoin that referenced this pull request Jun 18, 2025
@bitcoin bitcoin locked and limited conversation to collaborators Jul 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants