Skip to content

Conversation

ryanofsky
Copy link
Contributor

@ryanofsky ryanofsky commented Jul 7, 2023

This change drops the last kernel dependency on shutdown.cpp. It also adds new hooks for libbitcoinkernel applications to be able to interrupt kernel operations when the chain tip changes.

This change is mostly a refactoring, but does slightly change -stopatheight behavior (see release note and commit message)

(EDIT: This change is a refactoring and doesn't change behavior. The code change appeared to affect -stopatheight behavior, but doesn't really).

@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 7, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK TheCharlatan, furszy, hebasto, MarcoFalke
Concept ACK theuni, stickies-v, mzumsande

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #28051 (Get rid of shutdown.cpp/shutdown.h, use SignalInterrupt directly by ryanofsky)
  • #26762 (bugfix: Make CCheckQueue RAII-styled by hebasto)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@ryanofsky
Copy link
Contributor Author

Updated 828af9a -> c2bcc33 (pr/stopafter.1 -> pr/stopafter.2, compare) just updating comments and adding release note about -stopatheight behavior

@theuni
Copy link
Member

theuni commented Jul 7, 2023

Very nice. Concept ACK and quick (very shallow) codereview ACK.

I think the -stopatheight change is reasonable. If it turns out anyone/anything was relying on the way this worked before, we could always add functionality for that use-case (maybe -stopatblock?)

However, the help string should be updated from:

Stop running after reaching the given height in the main chain

It sounds like it's not completely accurate now, and less so after this change :)

@TheCharlatan
Copy link
Contributor

TheCharlatan commented Jul 7, 2023

Strong Concept ACK :)

I'm glad you changed your mind on returning things from the notifications, putting the infrastructure in place to allow the code to now bubble a -stopatheight interrupt is a clear advantage to me.

I was about to open a similar PR, albeit with a different approach for handling -stopafterblockimport (patch). Instead of creating a new notification for it, I thought slightly refactoring the function to instead return when the block import is done and then allow the user of the kernel (in this case the init code) to decide what to do is easier to reason with. This has the added benefit that users of the kernel can now just call BlockImport without having to rely on options and notification handlers to skip loading the mempool. Can you comment on this approach?

Copy link
Contributor Author

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated c2bcc33 -> 09938a4 (pr/stopafter.2 -> pr/stopafter.3, compare) updating -stopatheight documentation to be a little more precise.

I was also going to add a new commit 03dfa85 dropping shutdown.h and shutdown.cpp entirely, but it was bigger than I thought it would be so will save it for another PR. [EDIT: This is now done in #28051]

re: #28048 (comment)

different approach for handling -stopafterblockimport (patch). Instead of creating a new notification for it, I thought slightly refactoring the function to instead return when the block import is done and then allow the user of the kernel (in this case the init code) to decide what to do is easier to reason with. This has the added benefit that users of the kernel can now just call BlockImport without having to rely on options and notification handlers to skip loading the mempool. Can you comment on this approach?

I think the approach looks good, and thought of doing something similar, but I didn't because I thought it would make this change bigger and more complicated. I think you should post that patch as a separate PR, because @furszy is actually making similar changes in #27607 and should be able to give good review and feedback. Having those changes done separately would let me simplify this PR and probably make for a better kernel API as you say.

@hebasto
Copy link
Member

hebasto commented Jul 7, 2023

Concept ACK.

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah.
The first commit on #27607 extracts the LoadMempool call out of the ImportBlocks method, second and third one are about renaming the function (and some other fields) and documenting it. Then the rest are improving the indexes initialization process so we can, at the end, erase the indexes global flag.

But the -stopafterblockimport changes are new and conceptually look great in a first glance @TheCharlatan. Happy to review it if you push it into a standalone PR.

Maybe the simplest path to not conflict much could be to provide ThreadImport with a "return_after_import" bool arg (true when -stopafterblockimport is set), so the function returns early after importing blocks, so you don't need to make the mempool and indexes flag changes.

@TheCharlatan
Copy link
Contributor

Re #28048 (review)

Maybe the simplest path to not conflict much could be to provide ThreadImport with a "return_after_import" bool arg (true when -stopafterblockimport is set)

Thanks @furszy, opened #28053.

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Comment on lines 3375 to 3377
// Ignoring return value for now, this could be changed to bubble up
// kernel::Interrupted value to the caller so the caller could
// distinguish between completed and interrupted operations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see in which scenario blockTip() would return kernel::Interrupted here, but the docs seem to indicate that it could. I'd suggest to either update the docs to reflect that we never expect this to interrupt, or add a logging statement similar to what's done in ThreadImport?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #28048 (comment)

I don't see in which scenario blockTip() would return kernel::Interrupted here, but the docs seem to indicate that it could. I'd suggest to either update the docs to reflect that we never expect this to interrupt, or add a logging statement similar to what's done in ThreadImport?

I'm happy to update the comment, but it would help to have a specific suggestion, because to me this is only saying that a kernel::Interrupted value might be returned here (as a fact of the [[nodiscard]] virtual InterruptResult blockTip return type). I don't think there's a suggestion here that bitcoind will return kernel::Interrupted{} . But other libbitcoinkernel applications or tests might could handle the blockTip notification in the future and return an interrupted value, and that case could be handled here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was confusing the kernel/node boundaries a bit. You're right that here in the kernel logic that is validation.cpp, the documentation and behaviour are what we'd expect.

My concern was around the node implementation of blockTip() now potentially calling StartShutdown() and returning a kernel:Interrupted without leaving any trace. It's not something we'd expect to happen in InvalidateBlock() (and it would be behaviour change). I think I was wrong in suggesting we catch/log/document that here in validation.cpp though, as it's node specific behaviour.

Perhaps it would be beneficial to move this logging to KernelNotifications::blocksImported() and add a similar logging statement in KernelNotifications::blockTip?

...
Although having typed all of this out... Is InvalidateBlock() even meant to be/stay in kernel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #28048 (comment)

I'm not too concerned about blockTip notification shutting down during an invalidateblock call, because it seems like a corner case to worry about someone using -stopafterheight and invalidblock at the same time. If someone is doing this, I think there is only a small change of behavior not worth documenting where the -stopatheight option might take effect a little earlier and the node would shut down a little sooner.

If distinguishing invalidateblock tip changes from other tip changes is a concern for other reaons in the future, it could be addressed by adding an enum parameter to the blockTip function that indicates the source or the tip change.

The point of the "this could be changed to bubble up kernel::Interrupted" comment is not to worry about the stopatheight+invalidateblock corner case, but just suggest in general it should probably become a pattern to do something like if (IsInterrupted(result)) return std::get<Interrrupted>(result) and bubble up interrupted values to callers.

On logging, I think it is usually better to log from kernel code rather than from node hooks, because if the log statements are actually useful for our code there is a good chance they will be useful for other code as well.

On InvalidateBlock staying in the kernel, declaring blocks invalid seems like a useful feature to offer to kernel applications, so I don't think there would be much benefit to removing it.

Copy link
Contributor Author

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 09938a4 -> 3e6b687 (pr/stopafter.3 -> pr/stopafter.4, compare) just fixing a spelling mistake that was pointed out

Comment on lines 3375 to 3377
// Ignoring return value for now, this could be changed to bubble up
// kernel::Interrupted value to the caller so the caller could
// distinguish between completed and interrupted operations.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #28048 (comment)

I don't see in which scenario blockTip() would return kernel::Interrupted here, but the docs seem to indicate that it could. I'd suggest to either update the docs to reflect that we never expect this to interrupt, or add a logging statement similar to what's done in ThreadImport?

I'm happy to update the comment, but it would help to have a specific suggestion, because to me this is only saying that a kernel::Interrupted value might be returned here (as a fact of the [[nodiscard]] virtual InterruptResult blockTip return type). I don't think there's a suggestion here that bitcoind will return kernel::Interrupted{} . But other libbitcoinkernel applications or tests might could handle the blockTip notification in the future and return an interrupted value, and that case could be handled here.

ryanofsky added a commit to bitcoin-core/gui that referenced this pull request Jul 11, 2023
…on out of blockstorage

462390c refactor: Move stopafterblockimport handling out of blockstorage (TheCharlatan)

Pull request description:

  This has the benefit of moving this StartShutdown call out of the blockstorage file and thus out of the kernel's responsibility. The user can now decide if he wants to start shutdown / interrupt after a block import or not.

  This also simplifies bitcoin/bitcoin#28048, making it one fewer shutdown call to handle.

ACKs for top commit:
  MarcoFalke:
    lgtm ACK 462390c  🗝
  ryanofsky:
    Code review ACK 462390c. Just has been rebased and is a simpler change after #27607

Tree-SHA512: 84e58256b1c61f10e7ec5ecf32916f40a2ab1ea7cce703de0fa1c61ee2be94bd45ed32718bc99903b6eff3e6d3d5b506470bf567ddbb444a58232913918e8ab8
This change drops the last kernel dependency on shutdown.cpp. It also adds new
hooks for libbitcoinkernel applications to be able to interrupt kernel
operations when the chain tip changes.

This is a refactoring that does not affect behavior. (Looking at the code it
can appear like the new break statement in the ActivateBestChain function is a
change in behavior, but actually the previous StartShutdown call was indirectly
triggering a break before, because it was causing m_chainman.m_interrupt to be
true. The new code just makes the break more obvious.)
Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

@@ -0,0 +1 @@
- The `-stopatheight` option will now stop exactly at the specified height, rather than at or above the specified height. Since the node shuts down as soon as it find a valid chain at the specified height, it is possible for the resulting chain to no longer be the most-work chain, if some of the blocks above the specified height are invalid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to note that this proposal has some history, see Issue #13477 and the two closed PRs: #13490 and #13713.

In particular, comment #13490 (comment) still seems relevant:
The expectation that -stopatheight stops exactly at the specified height relies on ActivateBestChainStep() only connecting a single block in each invocation, which is currently the case, but more
of a coincidence due to not wanting to lock cs_main for too long than an actual guarantee given by the function.

Maybe it would be good to explicitly document in ActivateBestChainStep that -stopatheight relies on it conecting just a single block (outside of a reorg situation)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually want to make any new guarantees here. I'm just trying to make a simplification to the ActivateBestChain function and document the resulting behavior in release notes. Maybe the following release note would be better?

  • The -stopatheight option now stops earlier and no longer tries to validate and connect all the downloaded blocks above the specified height on the most-work chain. This means the resulting chain is more likely to be exactly the specified height instead of longer. But it also means the resulting chain might no longer the most-work chain if blocks above the specified height appeared to contain more work but were actually invalid.

Would appreciate any suggestions to make the description better.

Copy link
Contributor

@mzumsande mzumsande Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this deeper and did some test runs on signet, and now I'm no longer sure that this is a behavioral change requiring a release note in the first place:

My understanding is that the IBD behavior on master is:

  • ActivateBestChainStep()  will connect only one block if we made progress
  • The inner do / while loop of ActivateBestChain also will only run once unless the chain has less work than intitially (only relevant in a reorg).
  • so during IBD we'll connect one block, release cs_main, call StartShutdown() if the stopheight is reached, break out of the outer do / while loop because m_chainman.m_interrupt is set.
  • There is a race between Shutdown() executed in the init thread, and msghand, so it's possible that msghand connects another block or two (by executing new calls to ABC!) before it's being stopped by Shutdown(). But the same should be possible after this PR?

Also, the explanation of previous behavior in the commit msg ("It would not stop connecting blocks after the specified  height if they were already downloaded.") does not seem to be accurate because of the existing break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #28048 (comment)

🤦‍♂️ Thanks! I didn't notice the StartShutdown call was followed by a if (m_chainman.m_interrupt) break line. So it was already exiting the loop after the height check, and adding a new break would not change anything.

I think you are also right that there are race conditions where extra blocks could be attached, but I haven't looked into the details. Maybe it is worth opening an issue with your observations, in case there is a quick fix, or more questions to discuss. Naively, I'd think ProcessNewBlock should just do if (m_interrupt) return false; and the types of races you mention would be avoided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naively, I'd think ProcessNewBlock should just do if (m_interrupt) return false; and the types of races you mention would be avoided.

Hmm, if it arrived to ProcessNewBlock is because the node requested the block. So I would try to not discard it, and instead make it pass through AcceptBlock so it can be stored. Then quit early before ABC.

Copy link
Contributor Author

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased 3e6b687 -> cbea21d (pr/stopafter.4 -> pr/stopafter.5, compare) fixing conflict with #28053 and also removing mentions about changed -stopafterheight behavior after Martin's comment #28048 (comment)
Updated cbea21d -> 31eca93 (pr/stopafter.5 -> pr/stopafter.6, compare) adding comment about distinguishing types of blockTip to address stickies comment #28048 (comment)

@@ -0,0 +1 @@
- The `-stopatheight` option will now stop exactly at the specified height, rather than at or above the specified height. Since the node shuts down as soon as it find a valid chain at the specified height, it is possible for the resulting chain to no longer be the most-work chain, if some of the blocks above the specified height are invalid.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #28048 (comment)

🤦‍♂️ Thanks! I didn't notice the StartShutdown call was followed by a if (m_chainman.m_interrupt) break line. So it was already exiting the loop after the height check, and adding a new break would not change anything.

I think you are also right that there are race conditions where extra blocks could be attached, but I haven't looked into the details. Maybe it is worth opening an issue with your observations, in case there is a quick fix, or more questions to discuss. Naively, I'd think ProcessNewBlock should just do if (m_interrupt) return false; and the types of races you mention would be avoided.

Comment on lines 3375 to 3377
// Ignoring return value for now, this could be changed to bubble up
// kernel::Interrupted value to the caller so the caller could
// distinguish between completed and interrupted operations.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #28048 (comment)

I'm not too concerned about blockTip notification shutting down during an invalidateblock call, because it seems like a corner case to worry about someone using -stopafterheight and invalidblock at the same time. If someone is doing this, I think there is only a small change of behavior not worth documenting where the -stopatheight option might take effect a little earlier and the node would shut down a little sooner.

If distinguishing invalidateblock tip changes from other tip changes is a concern for other reaons in the future, it could be addressed by adding an enum parameter to the blockTip function that indicates the source or the tip change.

The point of the "this could be changed to bubble up kernel::Interrupted" comment is not to worry about the stopatheight+invalidateblock corner case, but just suggest in general it should probably become a pattern to do something like if (IsInterrupted(result)) return std::get<Interrrupted>(result) and bubble up interrupted values to callers.

On logging, I think it is usually better to log from kernel code rather than from node hooks, because if the log statements are actually useful for our code there is a good chance they will be useful for other code as well.

On InvalidateBlock staying in the kernel, declaring blocks invalid seems like a useful feature to offer to kernel applications, so I don't think there would be much benefit to removing it.

Copy link
Contributor

@TheCharlatan TheCharlatan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very happy with this change. Nice to see these last few PRs pushing configuration options out of the kernel and letting the user decide how to react to kernel actions.

ACK 31eca93

Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept and light review ACK 31eca93

@@ -0,0 +1 @@
- The `-stopatheight` option will now stop exactly at the specified height, rather than at or above the specified height. Since the node shuts down as soon as it find a valid chain at the specified height, it is possible for the resulting chain to no longer be the most-work chain, if some of the blocks above the specified height are invalid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naively, I'd think ProcessNewBlock should just do if (m_interrupt) return false; and the types of races you mention would be avoided.

Hmm, if it arrived to ProcessNewBlock is because the node requested the block. So I would try to not discard it, and instead make it pass through AcceptBlock so it can be stored. Then quit early before ABC.

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 31eca93, I have reviewed the code and it looks OK.

@@ -57,9 +58,14 @@ static void DoWarning(const bilingual_str& warning)

namespace node {

void KernelNotifications::blockTip(SynchronizationState state, CBlockIndex& index)
kernel::InterruptResult KernelNotifications::blockTip(SynchronizationState state, CBlockIndex& index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires to #include "kernel/notifications_interface.h", no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires to #include "kernel/notifications_interface.h", no?

It is included from the header, and is not a new dependency since this file is subclassing that one, but yes according to IWYU's reasoning it should be included directly here as well. I'll add the include if I need to update this PR for another reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say no. There should be never a need to include kernel/notifications_interface.h when you've included node/kernel_notifications.h. And I don't see an outcome where node/kernel_notifications.h was ever changed to not include kernel/notifications_interface.h. Thus, iwyu pragma export should be used.

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm ACK 31eca93 🕷

Show signature

Signature:

untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: lgtm ACK 31eca93a9eb8e54f856d3f558aa3c831ef181d37 🕷
Hkw+kA0Lvq2l5r2bA8OJD4QEslihedKBKDZiOpVUMUTGt5An9XfkcjccSf/I7WbXmLoxbsq78Feb4fRhSsC+AA==

@achow101 achow101 merged commit 01e5d6b into bitcoin:master Jul 14, 2023
Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 31eca93

The PR description is outdated - its last sentence can be removed.

@ryanofsky
Copy link
Contributor Author

The PR description is outdated - its last sentence can be removed.

Thanks, struck it out since PR was merged and the old description did get added to git history.

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jul 15, 2023
@bitcoin bitcoin locked and limited conversation to collaborators Jul 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Status: Done or Closed or Rethinking
Development

Successfully merging this pull request may close these issues.

10 participants