-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Bugfix: Ensure Atomicity in Wallet Settings Updates from Chain Interface #30697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: Ensure Atomicity in Wallet Settings Updates from Chain Interface #30697
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
cs
Before Adding/Removing Settings
(Assigned 28.0, but this is not a regression, so not a blocker for rc1. I think it can be backported to rc2, just in case it needs more time.) |
In draft now before addressing TSAN failure https://github.com/bitcoin/bitcoin/pull/30697/checks?check_run_id=29119417676 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK
Thanks @stickies-v for review, I've accepted all your suggestion and updated the branch. |
🚧 At least one of the CI tasks failed. HintsMake sure to run all tests locally, according to the documentation. The failure may happen due to a number of reasons, for example:
Leave a comment here, if you need help tracking down a confusing failure. |
135eb1d
to
2cba0be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 2cba0be, left non-blocking nits but happy to quickly re-ACK.
Would suggest updating the commit message to describe the change a bit better, add bugfix
to the title, and remove the double []
and :
notation:
wallet: bugfix: lock wallet context before adding/removing wallet settings
When performing certain wallet operations simultaneously, a race condition can be
triggered. For example, simultaneously loading two wallets can lead to only one
wallet being added to the startup settings. Fix this by adding locks, and add test
coverage.
2cba0be
to
245dfa2
Compare
Done, thank you. |
re-ACK 245dfa2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think using the context's wallet mutex is the best approach for this issue. The problem lies in how we update settings, which is not atomic. Today, the race is in the wallets flow, but tomorrow it could easily affect another settings option.
Updates are executed in three separate steps:
- Obtain settings value while acquiring the settings lock.
- Modify settings value.
- Overwrite settings value while acquiring the settings lock.
This process is not thread-safe. Different threads could modify the same base value simultaneously, overwriting data from other workers without realizing it.
Given this context, I've implemented a different solution that makes the underlying settings update operation atomic: furszy@de05ba1. This commit also shortens the test code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today, the race is in the wallets flow, but tomorrow it could easily affect another settings option.
I agree with this point. All clients that update settings will be victim to the same bug.
Given this context, I've implemented a different solution that makes the underlying settings update operation atomic: furszy@de05ba1. This commit also shortens the test code.
This seems like a great idea and an improvement to this PR. Clients will no longer need to lock a mutex before updating settings. I will take a look at furszy@de05ba1.
Thank you for your feedback @furszy
Take this one: furszy@b017949. |
245dfa2
to
528d4e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This force-push diff is a complete change in approach from the previous one, which was simpler and more straightforward 245dfa2 (i.e., locking the mutex before starting the settings update sequence). The commit was reviewed and acked by @stickies-v thanks
The new approach was written by @furszy which modifies the chain interface’s updateRwSetting
method to receive an update function that is executed while the settings lock is held. The update function will returns an enum indicating whether the update was successful and if we should overwrite the settings.
I also think this approach is better and more sustainable, allowing other clients to avoid implementing their own thread control mechanisms for updating or deleting settings from the chain interface.
The changes in this diff include:
- Adding a
SettingsAction
enum, which indicates whether we want to write to the settings after the update. - Introducing a
SettingsUpdate
type alias for the update function pointer. - Updating
updateRwSetting
to receive aSettingsUpdate
function pointer.overwriteRwSetting
: Replaces the setting with a new value.deleteRwSettings
: Completely erases a specified setting. This method is currently used only inoverwriteRwSetting
.
- Updating
AddWalletSetting
andRemoveWalletSetting
to define and pass the update function toupdateRwSetting
.
The tests were also refactored to be a bit dry.
@furszy, I made some changes to your commit, including adjusting variable initializations to use auto-deduced types, adding type aliases for the function pointers, and making variable names more descriptive.
🚧 At least one of the CI tasks failed. HintsMake sure to run all tests locally, according to the documentation. The failure may happen due to a number of reasons, for example:
Leave a comment here, if you need help tracking down a confusing failure. |
- Settings updates were not thread-safe, as they were executed in three separate steps: 1) Obtain settings value while acquiring the settings lock. 2) Modify settings value. 3) Overwrite settings value while acquiring the settings lock. This approach allowed concurrent threads to modify the same base value simultaneously, leading to data loss. When this occurred, the final settings state would only reflect the changes from the last thread that completed the operation, overwriting updates from other threads. Fix this by making the settings update operation atomic. - Add test coverage for this behavior. Co-authored-by: furszy <matiasfurszyfer@protonmail.com>
528d4e6
to
1b41d45
Compare
ACK 1b41d45 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self-code-ACK 1b41d45
SKIP_WRITE | ||
}; | ||
|
||
using SettingsUpdate = std::function<std::optional<interfaces::SettingsAction>(common::SettingsValue&)>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this alias is needed. It isn't really a complex function and forces devs to scroll / search it. It can just be in another line in the function declaration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it just to make the updateRwSetting
method easier to read. I will update when their is need to retouch.
Any reason not to just lock the mutex across the three steps? |
Yes the reasons were highlighted in #30697 (review) and #30697 (review) comments |
I wish this hadn't been merged so quickly as I was still working on reviewing this, but I understand getting this in before Also note that this PR did not update |
It would be be nice to split up the Chain interface or just rename it to reflect way it is used, but it probably makes sense for wallet to use some interface class to read and write settings instead of using ArgsManager directly as in stickies-v@53acb88. Wallet code originally did used ArgsManager directly until #22217, which changed it to use an interface class so wallet and node code could run in different processes and both could change settings without getting out of sync or corrupting the file. The wallet could be changed to use an independent settings file instead, and this would probably be a nice long term result. But defining interface classes for wallet code to go through instead of calling node functions and accessing node state directly has been an easy way to keep wallet code separate from node code without having to change wallet behavior. |
Yeah. I didn't touch it to avoid expanding this PR beyond what was necessary to fix the race. The GUI settings change events are executed sequentially on the main thread, so they aren't susceptible to this issue - at least not at this point in time -. We should definitely unify both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
post-merge ACK 1b41d45
I think some follow-up improvements should be made as per my review comments, but this PR addresses the race condition well.
Wallet code originally did used ArgsManager directly until #22217, which changed it to use an interface class so wallet and node code could run in different processes and both could change settings without getting out of sync or corrupting the file.
Thanks for the context, I was wondering why things are the way they are at the moment. It makes sense in a multi-process context, but yeah, I don't think these functions should be in Chain
, but it may currently be the least bad option. The developer notes also state we shouldn't be implementing any new functionality in interface methods, which we're violating here:
- Interface method definitions should wrap existing functionality instead of
implementing new functionality. Any substantial new node or wallet
functionality should be implemented insrc/node/
or
src/wallet/
and just exposed in
src/interfaces/
instead of being implemented there,
so it can be more modular and accessible to unit tests.
The wallet could be changed to use an independent settings file instead, and this would probably be a nice long term result.
This was my initial thought too, but too big of a change for a bugfix PR indeed.
{ | ||
if (value.isNull()) return deleteRwSettings(name, write); | ||
return updateRwSetting(name, [&](common::SettingsValue& settings) { | ||
settings = std::move(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving an lvalue ref here isn't safe, I think? Not an issue with the current callsites, but function signature should either take an rvalue ref or make a copy here. E.g. impl with rvalue ref (which is sufficient for now, but may require an overload in future):
git diff on 1b41d45
diff --git a/src/interfaces/chain.h b/src/interfaces/chain.h
index be596b1765..20a801fa94 100644
--- a/src/interfaces/chain.h
+++ b/src/interfaces/chain.h
@@ -16,6 +16,7 @@
#include <stddef.h>
#include <stdint.h>
#include <string>
+#include <utility>
#include <vector>
class ArgsManager;
@@ -361,7 +362,7 @@ public:
virtual bool updateRwSetting(const std::string& name, const SettingsUpdate& update_function) = 0;
//! Replace a setting in <datadir>/settings.json with a new value.
- virtual bool overwriteRwSetting(const std::string& name, common::SettingsValue& value, bool write = true) = 0;
+ virtual bool overwriteRwSetting(const std::string& name, common::SettingsValue&& value, bool write = true) = 0;
//! Delete a given setting in <datadir>/settings.json.
virtual bool deleteRwSettings(const std::string& name, bool write = true) = 0;
diff --git a/src/node/interfaces.cpp b/src/node/interfaces.cpp
index 54b986c926..0a89b5a3e4 100644
--- a/src/node/interfaces.cpp
+++ b/src/node/interfaces.cpp
@@ -828,7 +828,7 @@ public:
// Now dump value to disk if requested
return *action == interfaces::SettingsAction::SKIP_WRITE || args().WriteSettingsFile();
}
- bool overwriteRwSetting(const std::string& name, common::SettingsValue& value, bool write) override
+ bool overwriteRwSetting(const std::string& name, common::SettingsValue&& value, bool write) override
{
if (value.isNull()) return deleteRwSettings(name, write);
return updateRwSetting(name, [&](common::SettingsValue& settings) {
diff --git a/src/wallet/load.cpp b/src/wallet/load.cpp
index 129b5c7c2a..1149ce0a47 100644
--- a/src/wallet/load.cpp
+++ b/src/wallet/load.cpp
@@ -69,7 +69,7 @@ bool VerifyWallets(WalletContext& context)
// Pass write=false because no need to write file and probably
// better not to. If unnamed wallet needs to be added next startup
// and the setting is empty, this code will just run again.
- chain.overwriteRwSetting("wallet", wallets, /*write=*/false);
+ chain.overwriteRwSetting("wallet", std::move(wallets), /*write=*/false);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree rvalue reference would be an improvement, less for the reason that it would make the function safer to use, and more for the reason that it would make the function more convenient to use, because lvalue references cannot bind to temporary or literal values.
But even better than an rvalue reference would just be a plain value like common::SettingsValue value
because this would allow the caller to decide whether they want to move or copy, instead of forcing the caller to always move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken your suggestion @ryanofsky this is fixed in #30828
bool deleteRwSettings(const std::string& name, bool write) override | ||
{ | ||
args().LockSettings([&](common::Settings& settings) { | ||
settings.rw_settings.erase(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: behaviour already existed prior to this PR, but this line (and thus also overwriteRwSetting
) can throw if name
is not an existing key, and would benefit from being documented in both function's docstring I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: behaviour already existed prior to this PR, but this line (and thus also
overwriteRwSetting
) can throw ifname
is not an existing key, and would benefit from being documented in both function's docstring I think.
map::erase shouldn't actually throw anything, API was always intended to erase the setting if it exists otherwise do nothing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested this locally by passing an unknown settings name this does not throw.
Actually, doesn't the fact that the Rw methods on both |
This is ok because of the way multiprocess code works. When node and wallet code are running in the same process, and the wallet calls a virtual
I think this is a borderline case. Ideally wrapper methods in src/node/interfaces.cpp and src/wallet/interfaces.cpp are just a few lines long and consist of glue code that doesn't implement any real functionality. I think that is basically the case with the methods in this PR, though though they do implement some options and features, and maybe those could be moved from ChainImpl to ArgsManager so the ArgsManager interface is more complete and glue code is simpler. |
Thank you, that explanation actually clears up a lot, appreciate the thoughtful response, as always.
and it would also allow code re-use across both interfaces. |
Thanks, I think a natural cleanup or followup to this PR could move the In the longer run, I think wallet processes should not need to access the settings interface directly, either because they store their own JSON settings detached from node settings, or because the setting that tracks which wallets to load at startup is managed by gui or node processes instead of wallet processes (i.e. the list of wallets to load would be passed to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leaving some notes in case there are more changes to settings api later.
bool deleteRwSettings(const std::string& name, bool write) override | ||
{ | ||
args().LockSettings([&](common::Settings& settings) { | ||
settings.rw_settings.erase(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: behaviour already existed prior to this PR, but this line (and thus also
overwriteRwSetting
) can throw ifname
is not an existing key, and would benefit from being documented in both function's docstring I think.
map::erase shouldn't actually throw anything, API was always intended to erase the setting if it exists otherwise do nothing
{ | ||
if (value.isNull()) return deleteRwSettings(name, write); | ||
return updateRwSetting(name, [&](common::SettingsValue& settings) { | ||
settings = std::move(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree rvalue reference would be an improvement, less for the reason that it would make the function safer to use, and more for the reason that it would make the function more convenient to use, because lvalue references cannot bind to temporary or literal values.
But even better than an rvalue reference would just be a plain value like common::SettingsValue value
because this would allow the caller to decide whether they want to move or copy, instead of forcing the caller to always move.
Thanks for your reviews, @stickies-v and @ryanofsky.
I have started working on this in this commit. |
8466329 chain: simplify `deleteRwSettings` code and improve it's doc (ismaelsadeeq) f8d91f4 chain: dont check for null settings value in `overwriteRwSetting` (ismaelsadeeq) df60199 chain: ensure `updateRwSetting` doesn't update to a null settings (ismaelsadeeq) c8e2eee chain: uniformly use `SettingsAction` enum in settings methods (ismaelsadeeq) 1e9e735 chain: move new settings safely in `overwriteRwSetting` (ismaelsadeeq) 1c40900 test: remove wallet context from `write_wallet_settings_concurrently` (ismaelsadeeq) Pull request description: This PR addresses the remaining review comments from #30697 1. Disallowed overwriting settings values with a `null` value. 2. Uniformly used the `SettingsAction` enum in all settings methods instead of a boolean parameter. 3. Updated `overwriteRwSetting` to receive the `common::SettingsValue` parameter by value, enabling it to be moved safely. 4. Removed wallet context from the `write_wallet_settings_concurrently` unit test, as it is not needed. ACKs for top commit: achow101: ACK 8466329 ryanofsky: Code review ACK 8466329. Looks good, thanks for taking suggestions and applying them to the right commits. Only changes since last review were documentation improvements and simplifying delete method. furszy: Code review ACK 8466329 Tree-SHA512: baf2f59ed5aac4a4bda0c84fb6554a466a40d1f7b52b61dc2ff293d83ae60e82b925b7003237b633fecb65eba3a4c108e69166046895d1295809fbe0de67b052
This PR fixes #30620.
As outlined in the issue, creating two wallets with
load_on_startup=true
simultaneously results in only one wallet being added to the startup file.The current issue arises because the wallet settings update process involves:
This sequence is not thread-safe. Different threads could modify the same base value simultaneously, overwriting data from other workers without realizing it.
The PR attempts to fix this by modifying the chain interface's
updateRwSetting
method to accept a function that will be called with the settings reference. This function will either update or delete the setting and return an enum indicating whether the settings need to be overwritten in this or not.Additionally, this PR introduces two new methods to the chain interface:
overwriteRwSetting
: This method replaces the setting with a new value.Used in
VerifyWallets
deleteRwSettings
: This method completely erases a specified setting.This method is currently used only in
overwriteRwSetting
.These changes ensure that updates are race-free across all clients.