Skip to content

Conversation

promag
Copy link
Contributor

@promag promag commented Jun 17, 2020

This PR handles concurrent wallet loading.

This can be tested by running in parallel the following script a couple of times:

for i in {1..10}
do
  src/bitcoin-cli -regtest loadwallet foo
  src/bitcoin-cli -regtest unloadwallet foo
done

Eventually the error occurs:

error code: -4
error message:
Wallet already being loading.

For reference, loading and already loaded wallet gives:

error code: -4
error message:
Wallet file verification failed. Error loading wallet w1. Duplicate -wallet filename specified.

Fixes #19232.

@maflcko
Copy link
Member

maflcko commented Jun 17, 2020

Does the crash also happen in a python functional test? If yes, mind to add one?

@promag
Copy link
Contributor Author

promag commented Jun 17, 2020

@MarcoFalke haven't tried but I'd say it should - but it will be one of those loop-until-it-happens test. I'll give it a try.

@maflcko
Copy link
Member

maflcko commented Jun 17, 2020

does this need backport to 0.19?

@promag
Copy link
Contributor Author

promag commented Jun 17, 2020

@MarcoFalke yes, and probably is a clean backport.

@fscemama
Copy link

fscemama commented Jun 17, 2020

Hi, just tested the running of several background processes of the following script :

# Bitcoin Core RPC client version v0.20.0.0-ga62f0ed64f8bbbdfe6467ac5ce92ef5b5222d1bd
wallet=xxxxxx
for i in {1..1000}
do
  echo $i
  bitcoin-cli loadwallet $wallet
  bitcoin-cli unloadwallet $wallet
done

Crash is immediate. Here's valgrind output:

valgrind /usr/local/bin/bitcoind -daemon -walletrbf -conf=/root/.bitcoin/bitcoin.conf -fallbackfee=0.00001 -rpcworkqueue=128
==30964== Memcheck, a memory error detector
==30964== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==30964== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==30964== Command: /usr/local/bin/bitcoind -daemon -walletrbf -conf=/root/.bitcoin/bitcoin.conf -fallbackfee=0.00001 -rpcworkqueue=128
==30964==
Bitcoin Core starting
==30964==
==30964== HEAP SUMMARY:
==30964==     in use at exit: 2,232,114 bytes in 2,271 blocks
==30964==   total heap usage: 3,234 allocs, 963 frees, 2,462,760 bytes allocated
==30964==
==30964== LEAK SUMMARY:
==30964==    definitely lost: 0 bytes in 0 blocks
==30964==    indirectly lost: 0 bytes in 0 blocks
==30964==      possibly lost: 0 bytes in 0 blocks
==30964==    still reachable: 2,232,114 bytes in 2,271 blocks
==30964==         suppressed: 0 bytes in 0 blocks
==30964== Reachable blocks (those to which a pointer was found) are not shown.
==30964== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==30964==
==30964== For lists of detected and suppressed errors, rerun with: -s
==30964== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
root@backdev2:/home/psb/bin# ==30973== Syscall param pwrite64(buf) points to uninitialised byte(s)
==30973==    at 0x507B983: ??? (syscall-template.S:84)
==30973==    by 0x53CF188: __os_io (in /usr/lib/libdb_cxx-4.8.so)
==30973==    by 0x53BD04C: __memp_pgwrite.part.0 (in /usr/lib/libdb_cxx-4.8.so)
==30973==    by 0x53BD408: __memp_bhwrite (in /usr/lib/libdb_cxx-4.8.so)
==30973==    by 0x53BC31A: __memp_alloc (in /usr/lib/libdb_cxx-4.8.so)
==30973==    by 0x53BED27: __memp_fget (in /usr/lib/libdb_cxx-4.8.so)
==30973==    by 0x5348408: __db_verify (in /usr/lib/libdb_cxx-4.8.so)
==30973==    by 0x534A846: __db_verify_internal (in /usr/lib/libdb_cxx-4.8.so)
==30973==    by 0x52B9026: Db::verify(char const*, char const*, std::ostream*, unsigned int) (in /usr/lib/libdb_cxx-4.8.so)
==30973==    by 0x3DF7E5: BerkeleyEnvironment::Verify(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool (*)(boost::filesystem::path const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (db.cpp:277)
==30973==    by 0x3DFB4A: BerkeleyBatch::VerifyDatabaseFile(boost::filesystem::path const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, bool (*)(boost::filesystem::path const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)) (db.cpp:422)
==30973==    by 0x472D48: CWallet::Verify(interfaces::Chain&, WalletLocation const&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) (wallet.cpp:3778)
==30973==  Address 0xd63ee08 is 120 bytes inside a block of size 1,112 alloc'd
==30973==    at 0x402E7BD: malloc (vg_replace_malloc.c:309)
==30973==    by 0x53CC964: __os_malloc (in /usr/lib/libdb_cxx-4.8.so)
==30973==    by 0x539B4F3: __env_alloc (in /usr/lib/libdb_cxx-4.8.so)
==30973==    by 0x53BBBC0: __memp_alloc (in /usr/lib/libdb_cxx-4.8.so)
==30973==    by 0x53BED27: __memp_fget (in /usr/lib/libdb_cxx-4.8.so)
==30973==    by 0x52D4063: __bam_new_file (in /usr/lib/libdb_cxx-4.8.so)
==30973==    by 0x5386387: __db_new_file (in /usr/lib/libdb_cxx-4.8.so)
==30973==    by 0x5386906: __db_open (in /usr/lib/libdb_cxx-4.8.so)
==30973==    by 0x534B2B8: __db_vrfy_pgset (in /usr/lib/libdb_cxx-4.8.so)
==30973==    by 0x534B4BE: __db_vrfy_dbinfo_create (in /usr/lib/libdb_cxx-4.8.so)
==30973==    by 0x5348056: __db_verify (in /usr/lib/libdb_cxx-4.8.so)
==30973==    by 0x534A846: __db_verify_internal (in /usr/lib/libdb_cxx-4.8.so)
==30973==
==30973==
==30973== Process terminating with default action of signal 6 (SIGABRT)
==30973==    at 0x6350FFF: raise (raise.c:51)
==30973==    by 0x6352429: abort (abort.c:89)
==30973==    by 0x6349E66: __assert_fail_base (assert.c:92)
==30973==    by 0x6349F11: __assert_fail (assert.c:101)
==30973==    by 0x49085A: BerkeleyDatabase (db.h:125)
==30973==    by 0x49085A: std::unique_ptr<BerkeleyDatabase, std::default_delete<BerkeleyDatabase> > MakeUnique<BerkeleyDatabase, std::shared_ptr<BerkeleyEnvironment>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(std::shared_ptr<BerkeleyEnvironment>&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&) (memory.h:16)
==30973==    by 0x49091F: BerkeleyDatabase::Create(boost::filesystem::path const&) (db.h:139)
==30973==    by 0x47E2C1: CWallet::CreateWalletFromFile(interfaces::Chain&, WalletLocation const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, unsigned long) (wallet.cpp:3805)
==30973==    by 0x4823E6: LoadWallet(interfaces::Chain&, WalletLocation const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) (wallet.cpp:157)
==30973==    by 0x40A7FD: loadwallet(JSONRPCRequest const&) (rpcwallet.cpp:2626)
==30973==    by 0x22BFEB: operator() (server.h:104)
==30973==    by 0x22BFEB: std::_Function_handler<bool (JSONRPCRequest const&, UniValue&, bool), CRPCCommand::CRPCCommand(char const*, char const*, UniValue (*)(JSONRPCRequest const&), std::initializer_list<char const*>)::{lambda(JSONRPCRequest const&, UniValue&, bool)#1}>::_M_invoke(std::_Any_data const&, JSONRPCRequest const&, UniValue&, bool&&) (functional:1717)
==30973==    by 0x14A95E: operator() (functional:2127)
==30973==    by 0x14A95E: operator() (chain.cpp:208)
==30973==    by 0x14A95E: std::_Function_handler<bool (JSONRPCRequest const&, UniValue&, bool), interfaces::(anonymous namespace)::RpcHandlerImpl::RpcHandlerImpl(CRPCCommand const&)::{lambda(JSONRPCRequest const&, UniValue&, bool)#1}>::_M_invoke(std::_Any_data const&, JSONRPCRequest const&, UniValue&, bool&&) (functional:1717)
==30973==    by 0x2B32DB: operator() (functional:2127)
==30973==    by 0x2B32DB: ExecuteCommand (server.cpp:453)
==30973==    by 0x2B32DB: CRPCTable::execute(JSONRPCRequest const&) const (server.cpp:436)
==30973==
==30973== HEAP SUMMARY:
==30973==     in use at exit: 475,082,336 bytes in 3,760,553 blocks
==30973==   total heap usage: 9,888,764 allocs, 6,128,211 frees, 1,385,505,603 bytes allocated
==30973==

@promag
Copy link
Contributor Author

promag commented Jun 17, 2020

@MarcoFalke pushed 50b7e82, I welcome any python ninja to make it more cool.

for t in threads:
t.join()
global got_loading_error
assert_equal(got_loading_error, True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can assert that a race happened. Races are only intermittent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one is pretty much guaranteed, loading takes a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "qa: Test concurrent wallet loading" (9b009fa)

I don't think we can assert that a race happened. Races are only intermittent.

I would agree with Marco that in general tests like this are fragile and bad and slow and more trouble than they are worth, but also agree with promag that in practice this test is unlikely to fail.

To avoid potential problems from this, I'd suggest adding a comment here like "got_loading_error condition is not guaranteed to be true but very unlikely to be false based on how long it takes to load a wallet", so in case this does fail, future developers don't waste time trying to figure it out and can ignore or remove the test

In general it's easier to write tests for race conditions by making them invasive and implementing them in c++

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it was slower to spawn those threads than to load the wallet. Not sure about c++ test because it would need to force block inside loading. Maybe we could improve this first by waiting for all threads to be ready to load?

@promag promag force-pushed the 2020-06-loadwallet branch from 50b7e82 to ca10f4e Compare June 17, 2020 13:50
@promag
Copy link
Contributor Author

promag commented Jun 17, 2020

@fscemama if you can please repeat the test but using this branch.

@fscemama
Copy link

Sure. Can you confirm I must compile this and re-test ?

@fscemama
Copy link

Successfull. Will send these kind of normal messages:

error code: -4
error message:
Wallet already being loading.
error code: -18
error message:
Requested wallet does not exist or is not loaded

No crash. Hourra!

@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 17, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.

btw, couldn't get an error with the testing bash script on master (343c0bf) without code patching.

After #19249 the LoadWallet() function could be annotated with EXCLUSIVE_LOCKS_REQUIRED(!g_loading_wallet_mutex):

diff --git a/src/wallet/wallet.cpp b/src/wallet/wallet.cpp
index c93c79116..7e73a2bac 100644
--- a/src/wallet/wallet.cpp
+++ b/src/wallet/wallet.cpp
@@ -99,7 +99,7 @@ std::unique_ptr<interfaces::Handler> HandleLoadWallet(LoadWalletFn load_wallet)
     return interfaces::MakeHandler([it] { LOCK(cs_wallets); g_load_wallet_fns.erase(it); });
 }
 
-static Mutex g_loading_wallet_mutex;
+Mutex g_loading_wallet_mutex;
 static Mutex g_wallet_release_mutex;
 static std::condition_variable g_wallet_release_cv;
 static std::set<std::string> g_loading_wallet_set GUARDED_BY(g_loading_wallet_mutex);
diff --git a/src/wallet/wallet.h b/src/wallet/wallet.h
index cf000b0b7..3013fb360 100644
--- a/src/wallet/wallet.h
+++ b/src/wallet/wallet.h
@@ -49,11 +49,13 @@ struct bilingual_str;
 //! by the shared pointer deleter.
 void UnloadWallet(std::shared_ptr<CWallet>&& wallet);
 
+extern Mutex g_loading_wallet_mutex;
+
 bool AddWallet(const std::shared_ptr<CWallet>& wallet);
 bool RemoveWallet(const std::shared_ptr<CWallet>& wallet);
 std::vector<std::shared_ptr<CWallet>> GetWallets();
 std::shared_ptr<CWallet> GetWallet(const std::string& name);
-std::shared_ptr<CWallet> LoadWallet(interfaces::Chain& chain, const WalletLocation& location, bilingual_str& error, std::vector<bilingual_str>& warnings);
+std::shared_ptr<CWallet> LoadWallet(interfaces::Chain& chain, const WalletLocation& location, bilingual_str& error, std::vector<bilingual_str>& warnings) EXCLUSIVE_LOCKS_REQUIRED(!g_loading_wallet_mutex);
 std::unique_ptr<interfaces::Handler> HandleLoadWallet(LoadWalletFn load_wallet);
 
 enum class WalletCreationStatus {

@promag
Copy link
Contributor Author

promag commented Jun 18, 2020

btw, couldn't get an error with the testing bash script on master

@hebasto see #19232 (comment)

@hebasto
Copy link
Member

hebasto commented Jun 18, 2020

btw, couldn't get an error with the testing bash script on master

@hebasto see #19232 (comment)

Indeed :)

On master:

2020-06-18T14:57:15Z [httpworker.0] Using BerkeleyDB version Berkeley DB 5.3.28: (September  9, 2013)
2020-06-18T14:57:15Z [httpworker.0] Using wallet /home/hebasto/.bitcoin/regtest/wallets/foo
2020-06-18T14:57:15Z [httpworker.0] BerkeleyEnvironment::Open: LogDir=/home/hebasto/.bitcoin/regtest/wallets/foo/database ErrorFile=/home/hebasto/.bitcoin/regtest/wallets/foo/db.log
2020-06-18T14:57:15Z [httpworker.0] init message: Loading wallet...
bitcoind: ./wallet/bdb.h:113: BerkeleyDatabase::BerkeleyDatabase(std::shared_ptr<BerkeleyEnvironment>, std::string): Assertion `inserted.second' failed.
2020-06-18T14:57:15Z [httpworker.1] Using BerkeleyDB version Berkeley DB 5.3.28: (September  9, 2013)
2020-06-18T14:57:15Z [httpworker.1] Using wallet /home/hebasto/.bitcoin/regtest/wallets/foo
Aborted (core dumped)

@hebasto
Copy link
Member

hebasto commented Jun 18, 2020

Testing this PR with two parallel testing scripts and got different error messages:

$ ~/test.sh & ~/test.sh 
[1] 354
error code: -4
error message:
Wallet already being loading.
error code: -18
error message:
Requested wallet does not exist or is not loaded
error code: -4
error message:
Wallet already being loading.
error code: -18
error message:
Requested wallet does not exist or is not loaded
error code: -4
error message:
Wallet already being loading.
error code: -18
error message:
Requested wallet does not exist or is not loaded
error code: -4
error message:
Wallet already being loading.
error code: -18
error message:
Requested wallet does not exist or is not loaded
error code: -4
error message:
Wallet already being loading.
error code: -18
error message:
Requested wallet does not exist or is not loaded
{
  "name": "foo",
  "warning": ""
}
{
  "name": "foo",
  "warning": ""
}
{
  "name": "foo",
  "warning": ""
}
{
  "name": "foo",
  "warning": ""
}
{
  "name": "foo",
  "warning": ""
}
[1]+  Exit 18                 ~/test.sh

@promag
Copy link
Contributor Author

promag commented Jun 18, 2020

@hebasto yes that's expected - one is from concurrent load and the other from concurrent unload.

@promag promag force-pushed the 2020-06-loadwallet branch from ca10f4e to 9fc44e8 Compare June 19, 2020 00:02
Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 9b009fa, tested on Linux Mint 20 (x86_64):

$ cat ~/test.sh 
for i in {1..2}
do
  bash -c "echo $1; src/bitcoin-cli -regtest loadwallet foo"
  bash -c "echo $1; src/bitcoin-cli -regtest unloadwallet foo"
done
sleep 1s

$ ~/test.sh "A:" & ~/test.sh "B:"
[1] 46177
A:
B:
error code: -4
error message:
Wallet already being loading.
B:
error code: -18
error message:
Requested wallet does not exist or is not loaded
B:
error code: -4
error message:
Wallet already being loading.
B:
error code: -18
error message:
Requested wallet does not exist or is not loaded
{
  "name": "foo",
  "warning": ""
}
A:
A:
{
  "name": "foo",
  "warning": ""
}
A:

Also verified:

$ ./test/functional/wallet_multiwallet.py
$ ./test/functional/wallet_multiwallet.py --usecli

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review good-but-not-ideal ACK 9b009fa

This is an improvement but I don't think it's the ideal fix. A friendlier API would just require callers to write something like:

status = loadwallet("mywallet")
if status not in (success, wallet_already_loaded): raise Error
listtransactions("mywallet")

instead of:

while (status = loadwallet("mywallet") == wallet_already_loading:
   sleep()
if status not in (success, wallet_already_loaded): raise Error
listtransactions("mywallet")

This could be implemented by making the second loadwallet call block and wait for the first loadwallet call to finish, and then return the existing "duplicate wallet" error, instead of introducing a new error string and requiring the caller to poll before using the wallet. This change would also make simultaneous wallet load calls work more similar to the the way simultaneous wallet unload calls already work. I don't think it would be too much more complex: you'd have to add a condition variable wait and notify but could remove the new error handling.

But anyway, current fix is reasonable, fine to merge, and good to see implemented

for t in threads:
t.join()
global got_loading_error
assert_equal(got_loading_error, True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "qa: Test concurrent wallet loading" (9b009fa)

I don't think we can assert that a race happened. Races are only intermittent.

I would agree with Marco that in general tests like this are fragile and bad and slow and more trouble than they are worth, but also agree with promag that in practice this test is unlikely to fail.

To avoid potential problems from this, I'd suggest adding a comment here like "got_loading_error condition is not guaranteed to be true but very unlikely to be false based on how long it takes to load a wallet", so in case this does fail, future developers don't waste time trying to figure it out and can ignore or remove the test

In general it's easier to write tests for race conditions by making them invasive and implementing them in c++

@promag
Copy link
Contributor Author

promag commented Jun 23, 2020

This could be implemented by making the second loadwallet call block and wait for the first loadwallet call to finish, and then return the existing "duplicate wallet" error

If the 2nd waits then why return an error?

Edit: actually, I wonder if loading an already loaded wallet should be an error in the first place.

This change would also make simultaneous wallet load calls work more similar to the the way simultaneous wallet unload calls already work.

The 2nd concurrent unload doesn't wait.

I'm happy to make changes but I think current API is not ideal for concurrent clients.

I had the idea of counting loads (incr load and decr unload) and then effective unload would happen when count is zero. This would simplify concurrent clients using the same wallet - each client must load->use->unload. But this is breaking change and so I think a new pair of RPC methods would be better like acquirewallet/releasewallet. However this would bring same problems as unspent locks..

@ryanofsky
Copy link
Contributor

This could be implemented by making the second loadwallet call block and wait for the first loadwallet call to finish, and then return the existing "duplicate wallet" error

If the 2nd waits then why return an error?

Because I think the point of changing this is to fix a race condition, not add a new one. If you load the same wallet twice one load should succeed, and one load should return error duplicate wallet. There shouldn't be an unpredictable length of time where both calls mysteriously succeed. Also, I presume code would be simpler not having a special case to suppress the error.

Edit: actually, I wonder if loading an already loaded wallet should be an error in the first place.

This would be a reasonable design, but I don't see a reason not to return the existing error. At this point all it would be doing is hiding information we know and breaking backwards compatibility.

This change would also make simultaneous wallet load calls work more similar to the the way simultaneous wallet unload calls already work.

The 2nd concurrent unload doesn't wait.

Oh, I figured it did, but didn't look closely. Again I think it'd be better if it waited, to avoid any need to poll when you need a wallet unloaded. And I think it'd simplify both server & client implementations to drop the extra error condition and just return "succeeded", "failed", or "already not loaded", never "currently unloading".

I'm happy to make changes but I think current API is not ideal for concurrent clients.

I had the idea of counting loads (incr load and decr unload) and then effective unload would happen when count is zero. This would simplify concurrent clients using the same wallet - each client must load->use->unload. But this is breaking change and so I think a new pair of RPC methods would be better like acquirewallet/releasewallet. However this would bring same problems as unspent locks..

Yes, I wouldn't suggest this. I think the API I'm suggesting is pretty ideal for concurrent clients because it would let them unload & unload wallets reliably and never poll. The only place where reference counting would be useful would be for concurrent clients that have no way of communicating with each other. But even in this case reference counting would be unreliable and leaky if something ever went wrong like a connection getting dropped or a client process being killed.

Overall, I think the current fix in 9b009fa is good, and definitely better than having the bug. I think it would be simpler on both server and client sides if load and unload RPCs never returned "currently loading" or "currently unloading" and just waited to return definitive statuses. But this isn't necessary and I wouldn't want it to hold things up. I appreciate you fixing this and considering my suggestions!

@ryanofsky
Copy link
Contributor

This PR might be close to being mergeable. It has acks from me and hebasto and some review comments from Marco. Suggestions I made to improve this aren't blocking and could be followed up later.

@promag
Copy link
Contributor Author

promag commented Jun 29, 2020

I think the API I'm suggesting is pretty ideal for concurrent clients because it would let them unload & unload wallets reliably and never poll.

I disagree, current API (with or without your suggestion) doesn't make it easy to concurrently use a wallet (load+use+unload) - client A unloads and messes client B. And in the case some coordination exists on the client side, those improvements aren't really necessary.

reference counting would be unreliable and leaky if something ever went wrong like a connection getting dropped or a client process being killed.

Right, For this I thought about a new RPC endpoint where the wallet counting would be tied to the connection and load/unload would happen indirectly.

@ryanofsky
Copy link
Contributor

ryanofsky commented Jun 29, 2020

I disagree, current API (with or without your suggestion) doesn't make it easy to concurrently use a wallet (load+use+unload) - client A unloads and messes client B.

I don't think we disagree about anything here. This is irrelevant. Neither your fix, nor the one I'm suggesting tries to do this, because it would require storing client state in the server, which would be an awkward and unreliable way to design things.

And in the case some coordination exists on the client side, those improvements aren't really necessary.

Here is the use case: I want to write little shell script that unloads a wallet whether or not is it already loaded, maybe so I can back it up. Or I want to write a little shell script that loads a wallet, whether or not it is already loaded, maybe so I can check my balance or perform a transaction.

With the current API writing either of these scripts requires checking 4 for different error states (success/failure/in-progress/already-loaded-or-unloaded) and writing a loop to poll the server. With the API I am proposing, no polling is required and there are only 3 states to check instead of 4.

Also, the server-side implementation should be simpler. I am happy to implement it as a followup if the current PR is merged.

@maflcko
Copy link
Member

maflcko commented Jun 29, 2020

Concept ACK 9b009fa I have not reviewed the code

@promag
Copy link
Contributor Author

promag commented Jun 29, 2020

@ryanofsky Agree in making those improvements for future releases? IMO this is enough as a fix for latest version and to backport.

@ryanofsky
Copy link
Contributor

@ryanofsky Agree in making those improvements for future releases? IMO this is enough as a fix for latest version and to backport.

Yes I acked this PR twice and commented repeatedly it would be fine to improve this later.

@maflcko maflcko merged commit 5c3c7cc into bitcoin:master Jun 29, 2020
@promag promag deleted the 2020-06-loadwallet branch June 29, 2020 16:20
This was referenced Jun 29, 2020
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Jul 3, 2020
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Jul 3, 2020
fanquake added a commit that referenced this pull request Jul 10, 2020
2b79ac7 Clean up separated ban/discourage interface (Pieter Wuille)
0477348 Replace automatic bans with discouragement filter (Pieter Wuille)
e7f06f9 test: remove Cirrus CI FreeBSD job (fanquake)
eb6b82a qa: Test concurrent wallet loading (João Barbosa)
c9b49d2 wallet: Handle concurrent wallet loading (João Barbosa)
cf0b5a9 tests: Check that segwit inputs in psbt have both UTXO types (Andrew Chow)
3228b59 psbt: always put a non_witness_utxo and don't remove it (Andrew Chow)
ed5ec30 psbt: Allow both non_witness_utxo and witness_utxo (Andrew Chow)
68e0e6f rpc: show both UTXOs in decodepsbt (Andrew Chow)
27786d0 trivial: Suggested cleanups to surrounding code (Russell Yanofsky)
654420d wallet: Minimal fix to restore conflicted transaction notifications (Russell Yanofsky)
febebc4 Fix WSL file locking by using flock instead of fcntl (Samuel Dobson)
5c7151a gui: update Qt base translations for macOS release (fanquake)
c219d21 build: improved output of configure for build OS (sachinkm77)
0596a6e util: Don't reference errno when pthread fails. (MIZUTA Takeshi)

Pull request description:

  Currently backports the following to the 0.20 branch:
  * #18700 - Fix locking on WSL using flock instead of fcntl
  * #18982 - wallet: Minimal fix to restore conflicted transaction notifications
  * #19059 - gui: update Qt base translations for macOS release
  * #19152 - build: improve build OS configure output
  * #19194 -  util: Don't reference errno when pthread fails.
  * #19215 - psbt: Include and allow both non_witness_utxo and witness_utxo for segwit inputs
  * #19219 - Replace automatic bans with discouragement filter
  * #19300 - wallet: Handle concurrent wallet loading

ACKs for top commit:
  amitiuttarwar:
    ACK 0477348 2b79ac7 by comparing to original changes, double checking the diff
  sipa:
    utACK 2b79ac7
  laanwj:
    ACK 2b79ac7

Tree-SHA512: e5eb31d08288fa4a6e8aba08e60b16ce1988f14f249238b1cfd18ab2c8f6fcd9f07e3c0c491d32d2361fca26e3037fb0374f37464bddcabecea29069d6737539
@maflcko maflcko modified the milestones: 0.20.1, 0.19.2 Aug 22, 2020
backpacker69 referenced this pull request in peercoin/peercoin Sep 8, 2020
Github-Pull: #19300
Rebased-From: b9971ae
backpacker69 referenced this pull request in peercoin/peercoin Sep 8, 2020
Github-Pull: #19300
Rebased-From: 9b009fa
Bushstar pushed a commit to Bushstar/omnicore that referenced this pull request Oct 21, 2020
Bushstar pushed a commit to Bushstar/omnicore that referenced this pull request Oct 21, 2020
MarkLTZ added a commit to litecoinz-core/litecoinz that referenced this pull request Nov 8, 2020
maflcko pushed a commit to bitcoin-core/gui that referenced this pull request Dec 16, 2020
…t_loading_error

fab48da test: Fix intermittent wallet_multiwallet issue with got_loading_error (MarcoFalke)
fa8e15f test: pep8 wallet_multiwallet.py (MarcoFalke)

Pull request description:

  Failing the test after 10 iterations without a loading error is problematic because it may take 11 iterations to get a loading error.

  Fix that by running until a loading error occurs, which should happen in almost all runs within the first 10 iterations.

ACKs for top commit:
  ryanofsky:
    Code review ACK fab48da. This seems like a good workaround. I think more ideally think load and unload RPCs would not have racy status reporting (suggested previously bitcoin/bitcoin#19300 (review) and

Tree-SHA512: 6b80b26d916276efe2a01af93bca7dbf71a3e67db9d3deb15175070719bf7d1325a1410d93e74c0316942e388faa2ba185dc9d3759c82d1c73c3c509b9997f05
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Dec 17, 2020
…with got_loading_error

fab48da test: Fix intermittent wallet_multiwallet issue with got_loading_error (MarcoFalke)
fa8e15f test: pep8 wallet_multiwallet.py (MarcoFalke)

Pull request description:

  Failing the test after 10 iterations without a loading error is problematic because it may take 11 iterations to get a loading error.

  Fix that by running until a loading error occurs, which should happen in almost all runs within the first 10 iterations.

ACKs for top commit:
  ryanofsky:
    Code review ACK fab48da. This seems like a good workaround. I think more ideally think load and unload RPCs would not have racy status reporting (suggested previously bitcoin#19300 (review) and

Tree-SHA512: 6b80b26d916276efe2a01af93bca7dbf71a3e67db9d3deb15175070719bf7d1325a1410d93e74c0316942e388faa2ba185dc9d3759c82d1c73c3c509b9997f05
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Aug 28, 2021
Summary: This is a backport of [[bitcoin/bitcoin#19300 | core#19300]]

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Differential Revision: https://reviews.bitcoinabc.org/D9967
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regular crash+autorestart after loading/unloading many wallets
7 participants