-
Notifications
You must be signed in to change notification settings - Fork 37.7k
kernel: Rm ShutdownRequested and AbortNode from validation code. #27861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK d60ab3b. Overall looks great! Feel free to ignore most of comments I left. The only thing I think really should be addressed is the two BlockManager::Flush
calls aborting without returning errors. I think this is confusing and potentially could lead to bugs, but since the changes here don't actually affect the way the code run in practice, it doesn't need to block the PR.
src/node/blockstorage.cpp
Outdated
@@ -529,7 +529,7 @@ void BlockManager::FlushUndoFile(int block_file, bool finalize) | |||
{ | |||
FlatFilePos undo_pos_old(block_file, m_blockfile_info[block_file].nUndoSize); | |||
if (!UndoFileSeq().Flush(undo_pos_old, finalize)) { | |||
AbortNode("Flushing undo file to disk failed. This is likely the result of an I/O error."); | |||
m_opts.notifications.fatalError("Flushing undo file to disk failed. This is likely the result of an I/O error."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "kernel: Add fatalError method to notifications" (d60ab3b)
This is calling fatalError
and then continuing without returning an error or raising an exception, so I think the new code here is more confusing than the current code (even if strictly speaking it is a refactoring).
We could potentially change this code to treat failing flush as a normal error that gets returned to the caller. But I think if we are going to do that, it would be better to do in a separate PR which could receive targeted review without needing reviewers to look at all the other changes here.
For this PR, I would suggest doing one of two things:
-
Adding a
kernel::Notifications::flushError
method and callingflushError
here instead of afatalError
. The current code is making thisAbortNode
call in a different way than otherAbortNode
calls, and it would be confusing for the kernel notification interface to hide differences between errors that are not handled the same. AflushError
method could have a comment like "The flush error notification is sent when there is an I/O error saving state in memory to disk. It is recommended that applications treat this a as a fatal error, even though it is technically less severe, and the kernel may do some work after it happens." -
Reverting the change to this line and keeping the
AbortNode
function for now. This would get most of the work in the PR done, just leaving behind leaving 2 AbortNode calls that can be cleaned up with more care later.
src/node/blockstorage.cpp
Outdated
AbortNode("Flushing block file to disk failed. This is likely the result of an I/O error."); | ||
m_opts.notifications.fatalError("Flushing block file to disk failed. This is likely the result of an I/O error."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "kernel: Add fatalError method to notifications" (d60ab3b)
This is the other misleading fatalError
call (see comment above)
I was about to open a separate PR from this branch here: https://github.com/TheCharlatan/bitcoin/tree/kernelReturnOnAbort . I think this should go in a separate PR for ease of review. I'll also provide a list of all the other non-immediate returns that should be fine to do. |
Great! I also opened #27862 earlier which overlaps with the branch a little. I think between the branch and PR all the cases where errors are not currently returned are covered. |
d60ab3b
to
39d9b51
Compare
Rebased d60ab3b -> 3e60c02 (kernelInterrupt_0 -> kernelInterrupt_1, compare).
Updated 3e60c02 -> 39d9b51 (kernelInterrupt_1 -> kernelInterrupt_2, compare).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 39d9b51. I left a suggestion to reduce size of changes to index code, and I still think it is inappropriate to treat flush errors as fatal errors, and would prefer to add a separate flush error method (#27861 (comment)), but I won't get hung up on this.
re: #27861 (comment)
I'm actually not sure what this is referring to. Also, after this PR |
ACK, I'll revert to using |
Ugh, just noticed this gets annoying with the exit status - we probably don't want to introduce that in the blockstorage scope. Leaving as is then :/. |
39d9b51
to
315915b
Compare
Updated 39d9b51 -> 315915b (kernelInterrupt_2 -> kernelInterrupt_3, compare).
|
This change is mostly a refectoring that removes some code and gets rid of an unnecessary layer of indirection after bitcoin#27861 But it is not a pure refactoring since StartShutdown, AbortShutdown, and WaitForShutdown functions used to abort on failure, and the replacement code logs or returns errors instead.
Replace exceptions thrown by signal and wait methods with [[nodiscard]] return values. This is mostly a refactoring, but there is a slight change of behavior if AbortShutdown function fails. The original behavior which was unintentionally changed in bitcoin#27861 is restored, so it now triggers an assert failure again instead of throwing an exception. (The AbortShutdown function is only ever called in the the GUI version of Bitcoin Core when corruption is detected on loading and the user tries to reindex.) Problems with using exceptions were pointed out by MarcoFalke in bitcoin#27861 (comment).
This change is mostly a refectoring that removes some code and gets rid of an unnecessary layer of indirection after bitcoin#27861 But it is not a pure refactoring since StartShutdown, AbortShutdown, and WaitForShutdown functions used to abort on failure, and the replacement code logs or returns errors instead.
Replace exceptions thrown by signal and wait methods with [[nodiscard]] return values. This is mostly a refactoring, but there is a slight change of behavior if AbortShutdown function fails. The original behavior which was unintentionally changed in bitcoin#27861 is restored, so it now triggers an assert failure again instead of throwing an exception. (The AbortShutdown function is only ever called in the the GUI version of Bitcoin Core when corruption is detected on loading and the user tries to reindex.) Problems with using exceptions were pointed out by MarcoFalke in bitcoin#27861 (comment).
This change is mostly a refectoring that removes some code and gets rid of an unnecessary layer of indirection after bitcoin#27861 But it is not a pure refactoring since StartShutdown, AbortShutdown, and WaitForShutdown functions used to abort on failure, and the replacement code logs or returns errors instead.
Replace exceptions thrown by signal and wait methods with [[nodiscard]] return values. This is mostly a refactoring, but there is a slight change of behavior if AbortShutdown function fails. The original behavior which was unintentionally changed in bitcoin#27861 is restored, so it now triggers an assert failure again instead of throwing an exception. (The AbortShutdown function is only ever called in the the GUI version of Bitcoin Core when corruption is detected on loading and the user tries to reindex.) Problems with using exceptions were pointed out by MarcoFalke in bitcoin#27861 (comment).
This change is mostly a refectoring that removes some code and gets rid of an unnecessary layer of indirection after bitcoin#27861 But it is not a pure refactoring since StartShutdown, AbortShutdown, and WaitForShutdown functions used to abort on failure, and the replacement code logs or returns errors instead.
Replace exceptions thrown by signal and wait methods with [[nodiscard]] return values. This is mostly a refactoring, but there is a slight change of behavior if AbortShutdown function fails. The original behavior which was unintentionally changed in bitcoin#27861 is restored, so it now triggers an assert failure again instead of throwing an exception. (The AbortShutdown function is only ever called in the the GUI version of Bitcoin Core when corruption is detected on loading and the user tries to reindex.) Problems with using exceptions were pointed out by MarcoFalke in bitcoin#27861 (comment).
This change is mostly a refectoring that removes some code and gets rid of an unnecessary layer of indirection after bitcoin#27861 But it is not a pure refactoring since StartShutdown, AbortShutdown, and WaitForShutdown functions used to abort on failure, and the replacement code logs or returns errors instead.
Replace exceptions thrown by signal and wait methods with [[nodiscard]] return values. This is mostly a refactoring, but there is a slight change of behavior if AbortShutdown function fails. The original behavior which was unintentionally changed in bitcoin#27861 is restored, so it now triggers an assert failure again instead of throwing an exception. (The AbortShutdown function is only ever called in the the GUI version of Bitcoin Core when corruption is detected on loading and the user tries to reindex.) Problems with using exceptions were pointed out by MarcoFalke in bitcoin#27861 (comment).
This change is mostly a refectoring that removes some code and gets rid of an unnecessary layer of indirection after bitcoin#27861 But it is not a pure refactoring since StartShutdown, AbortShutdown, and WaitForShutdown functions used to abort on failure, and the replacement code logs or returns errors instead.
Replace exceptions thrown by signal and wait methods with [[nodiscard]] return values. This is mostly a refactoring, but there is a slight change of behavior if AbortShutdown function fails. The original behavior which was unintentionally changed in bitcoin#27861 is restored, so it now triggers an assert failure again instead of throwing an exception. (The AbortShutdown function is only ever called in the the GUI version of Bitcoin Core when corruption is detected on loading and the user tries to reindex.) Problems with using exceptions were pointed out by MarcoFalke in bitcoin#27861 (comment).
This change is mostly a refectoring that removes some code and gets rid of an unnecessary layer of indirection after bitcoin#27861 But it is not a pure refactoring since StartShutdown, AbortShutdown, and WaitForShutdown functions used to abort on failure, and the replacement code logs or returns errors instead.
Replace exceptions thrown by signal and wait methods with [[nodiscard]] return values. This is mostly a refactoring, but there is a slight change of behavior if AbortShutdown function fails. The original behavior which was unintentionally changed in bitcoin#27861 is restored, so it now triggers an assert failure again instead of throwing an exception. (The AbortShutdown function is only ever called in the the GUI version of Bitcoin Core when corruption is detected on loading and the user tries to reindex.) Problems with using exceptions were pointed out by MarcoFalke in bitcoin#27861 (comment).
This change is mostly a refectoring that removes some code and gets rid of an unnecessary layer of indirection after bitcoin#27861 But it is not a pure refactoring since StartShutdown, AbortShutdown, and WaitForShutdown functions used to abort on failure, and the replacement code logs or returns errors instead.
Replace exceptions thrown by signal and wait methods with [[nodiscard]] return values. This is mostly a refactoring, but there is a slight change of behavior if AbortShutdown function fails. The original behavior which was unintentionally changed in bitcoin#27861 is restored, so it now triggers an assert failure again instead of throwing an exception. (The AbortShutdown function is only ever called in the the GUI version of Bitcoin Core when corruption is detected on loading and the user tries to reindex.) Problems with using exceptions were pointed out by MarcoFalke in bitcoin#27861 (comment).
This change is mostly a refectoring that removes some code and gets rid of an unnecessary layer of indirection after bitcoin#27861 But it is not a pure refactoring since StartShutdown, AbortShutdown, and WaitForShutdown functions used to abort on failure, and the replacement code logs or returns errors instead.
Replace exceptions thrown by signal and wait methods with [[nodiscard]] return values. This is mostly a refactoring, but there is a slight change of behavior if AbortShutdown function fails. The original behavior which was unintentionally changed in bitcoin#27861 is restored, so it now triggers an assert failure again instead of throwing an exception. (The AbortShutdown function is only ever called in the the GUI version of Bitcoin Core when corruption is detected on loading and the user tries to reindex.) Problems with using exceptions were pointed out by MarcoFalke in bitcoin#27861 (comment).
This change is mostly a refectoring that removes some code and gets rid of an unnecessary layer of indirection after bitcoin#27861 But it is not a pure refactoring since StartShutdown, AbortShutdown, and WaitForShutdown functions used to abort on failure, and the replacement code logs or returns errors instead.
Replace exceptions thrown by signal and wait methods with [[nodiscard]] return values. This is mostly a refactoring, but there is a slight change of behavior if AbortShutdown function fails. The original behavior which was unintentionally changed in bitcoin#27861 is restored, so it now triggers an assert failure again instead of throwing an exception. (The AbortShutdown function is only ever called in the the GUI version of Bitcoin Core when corruption is detected on loading and the user tries to reindex.) Problems with using exceptions were pointed out by MarcoFalke in bitcoin#27861 (comment).
This change is mostly a refectoring that removes some code and gets rid of an unnecessary layer of indirection after bitcoin#27861 But it is not a pure refactoring since StartShutdown, AbortShutdown, and WaitForShutdown functions used to abort on failure, and the replacement code logs or returns errors instead.
Replace exceptions thrown by signal and wait methods with [[nodiscard]] return values. This is mostly a refactoring, but there is a slight change of behavior if AbortShutdown function fails. The original behavior which was unintentionally changed in bitcoin#27861 is restored, so it now triggers an assert failure again instead of throwing an exception. (The AbortShutdown function is only ever called in the the GUI version of Bitcoin Core when corruption is detected on loading and the user tries to reindex.) Problems with using exceptions were pointed out by MarcoFalke in bitcoin#27861 (comment).
This change is mostly a refectoring that removes some code and gets rid of an unnecessary layer of indirection after bitcoin#27861 But it is not a pure refactoring since StartShutdown, AbortShutdown, and WaitForShutdown functions used to abort on failure, and the replacement code logs or returns errors instead.
This change is mostly a refectoring that removes some code and gets rid of an unnecessary layer of indirection after bitcoin#27861 But it is not a pure refactoring since StartShutdown, AbortShutdown, and WaitForShutdown functions used to abort on failure, and the replacement code logs or returns errors instead.
Replace exceptions thrown by signal and wait methods with [[nodiscard]] return values. This is mostly a refactoring, but there is a slight change of behavior if AbortShutdown function fails. The original behavior which was unintentionally changed in bitcoin#27861 is restored, so it now triggers an assert failure again instead of throwing an exception. (The AbortShutdown function is only ever called in the the GUI version of Bitcoin Core when corruption is detected on loading and the user tries to reindex.) Problems with using exceptions were pointed out by MarcoFalke in bitcoin#27861 (comment).
This change is mostly a refectoring that removes some code and gets rid of an unnecessary layer of indirection after bitcoin#27861 But it is not a pure refactoring since StartShutdown, AbortShutdown, and WaitForShutdown functions used to abort on failure, and the replacement code logs or returns errors instead.
…SignalInterrupt directly 6db04be Get rid of shutdown.cpp/shutdown.h, use SignalInterrupt directly (Ryan Ofsky) 213542b refactor: Add InitContext function to initialize NodeContext with global pointers (Ryan Ofsky) feeb7b8 refactor: Remove calls to StartShutdown from KernelNotifications (Ryan Ofsky) 6824eec refactor: Remove call to StartShutdown from stop RPC (Ryan Ofsky) 1d92d89 util: Get rid of uncaught exceptions thrown by SignalInterrupt class (Ryan Ofsky) ba93966 refactor: Remove call to ShutdownRequested from IndexWaitSynced (Ryan Ofsky) 42e5829 refactor: Remove call to ShutdownRequested from HTTPRequest (Ryan Ofsky) 73133c3 refactor: Add NodeContext::shutdown member (Ryan Ofsky) f4a8bd6 refactor: Remove call to StartShutdown from qt (Ryan Ofsky) f0c73c1 refactor: Remove call to ShutdownRequested from rpc/mining (Ryan Ofsky) 263b23f refactor: Remove call to ShutdownRequested from chainstate init (Ryan Ofsky) Pull request description: This change drops `shutdown.h` and `shutdown.cpp` files, replacing them with a `NodeContext::shutdown` member which is used to trigger shutdowns directly. This gets rid of an unnecessary layer of indirection, and allows getting rid of the `kernel::g_context` global. Additionally, this PR tries to improve error handling of `SignalInterrupt` code by marking relevant methods `[[nodiscard]]` to avoid the possibility of uncaught exceptions mentioned bitcoin/bitcoin#27861 (comment). Behavior is changing In a few cases which are noted in individual commit messages. Particularly: GUI code more consistently interrupts RPCs when it is shutting down, shutdown state no longer persists between unit tests, the stop RPC now returns an RPC error if requesting shutdown fails instead of aborting, and other failed shutdown calls now log errors instead of aborting. This PR is a net reduction in lines of code, but in some cases the explicit error handling and lack of global shutdown functions do make it more verbose. The verbosity can be seen as good thing if it discourages more code from directly triggering shutdowns, and instead encourages code to return errors or send notifications that could be translated into shutdowns. Probably a number of existing shutdown calls could just be replaced by better error handling. ACKs for top commit: achow101: ACK 6db04be TheCharlatan: Re-ACK 6db04be maflcko: ACK 6db04be 👗 stickies-v: re-ACK 6db04be Tree-SHA512: 7a34cb69085f37e813c43bdaded1a0cbf6c53bd95fdde96f0cb45346127fc934604c43bccd3328231ca2f1faf712a7418d047ceabd22ef2dca3c32ebb659e634
Replace exceptions thrown by signal and wait methods with [[nodiscard]] return values. This is mostly a refactoring, but there is a slight change of behavior if AbortShutdown function fails. The original behavior which was unintentionally changed in #27861 is restored, so it now triggers an assert failure again instead of throwing an exception. (The AbortShutdown function is only ever called in the the GUI version of Bitcoin Core when corruption is detected on loading and the user tries to reindex.) Problems with using exceptions were pointed out by MarcoFalke in bitcoin/bitcoin#27861 (comment).
Summary: > validation: Stricter assumeutxo error handling in LoadChainstate > > Make LoadChainstate return an explicit error when snapshot validation succeeds, > but there is an error trying to replace the background chainstate with the > snapshot chainstate. Previously in this case LoadChainstate would trigger a > shutdown and return INTERRUPTED, now it will return an actual error code. > > There's no real change to behavior other than error message being formatted a > little differently. > > Motivation for this change is to replace error handling via callbacks with > error handling via return value ahead of > bitcoin/bitcoin#27861 bitcoin/bitcoin@1c7d08b > validation: Stricter assumeutxo error handling in InvalidateCoinsDBOnDisk > > Currently InvalidateCoinsDBOnDisk is calling AbortNode without an error to the > caller if it fails. Change it to return just return util::Result, and update > the caller to handle the error itself. > > This causes the secondary error to be shown below the main error instead of the > other way around. bitcoin/bitcoin@1c7d08b This is a backport of [[bitcoin/bitcoin#27862 | core#27862]] Test Plan: The only change in behavior is the order of the error messages in logs, but there are currently no tests for these error messages, and there is currently no easy way to test this. Just check that the code still compiles and does not break existing tested behavior. `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Subscribers: Fabien Differential Revision: https://reviews.bitcoinabc.org/D15624
Get rid of all
ShutdownRequested
calls in validation code by introducing an interrupt object that applications can use to cancel long-running kernel operations.Replace all
AbortNode
calls in validation code with new fatal error and flush error notifications so kernel applications can be notified about failures and choose how to handle them.This pull request is part of the
libbitcoinkernel
project #27587 The libbitcoinkernel Project and more specifically its "Step 2: Decouple most non-consensus code from libbitcoinkernel".The pull request mostly allows dropping the kernel dependency on shutdown.cpp. The only dependency left after this is a
StartShutdown
call which will be removed in followup PR #27711. This PR also drops the last reference to theuiInterface
global in kernel code. The process of moving theuiInterface
out of the kernel was started in #27636.This pull request contains a subset of patches originally proposed in #27711. It will be part of a series of changes required to make handling of interrupts (or in other words the current shutdown procedure) in the kernel library more transparent and less reliable on global mutable state. The set of patches contained here was originally proposed by @ryanofsky here.