multiprocess: Lock CapnpProtocol::m_loop with mutex #31815

ryanofsky · 2025-02-07T11:55:35Z

This change is intended to fix intermittent failures in ipc_tests unit test and rpc_misc.py functional tests that happen on mac os, which sometimes fail with error terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument as reported bitcoin-core/libmultiprocess#154

This error could happen when m_loop.reset(); in CapnpProtocol::startLoop executes before the m_loop->removeClient(lock); method returns in the ~CapnpProtocol destructor , causing an failure to reacquire the EventLoop::m_mutex inside the EventLoop::removeClient method, and the "mutex lock failed" exception to be thrown.

Fix this error by adding a new mutex to guard the CapnpProtocol::m_loop variable and making sure m_loop.reset(); cannot be called until after m_loop->removeClient(lock); returns.

This PR is part of the process separation project.

This change is intended to fix occasional ipc_tests unit tests and rpc_misc.py functional tests errors that happen on mac os, which sometimes fail with error "terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument" as reported bitcoin-core/libmultiprocess#154 This error could happen when `m_loop.reset();` in `CapnpProtocol::startLoop` executes before the `m_loop->removeClient(lock);` call in the `~CapnpProtocol` destructor returns, causing an failure to reacquire the `EventLoop::m_mutex` inside the `removeClient` method. Fix this error by adding a new mutex to guard access to the `CapnpProtocol::m_loop` variable and making sure `m_loop.reset();` cannot be called until after `m_loop->removeClient(lock);` returns.

DrahtBot · 2025-02-07T11:55:39Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31815.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

ryanofsky · 2025-02-07T20:16:59Z

So far as of baea320 it seems like this change does not solve the problem it was supposed to fix.

There is a new "libc++abi: terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument" failure https://github.com/bitcoin/bitcoin/actions/runs/13206947916/job/36872233303?pr=30975#step:7:2433 in 553a7e2 which includes a cherry-picked version of this fix.

Sjors · 2025-02-07T20:17:27Z

This might have a bug somewhere, in #30975:

test/ipc_tests.cpp:11: Entering test suite "ipc_tests"
test/ipc_tests.cpp:12: Entering test case "ipc_tests"
libc++abi: terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument
2025-02-07T19:59:17.387846Z [unknown] [test/util/random.cpp:48] [SeedRandomStateForTest] Setting random seed for current tests to RANDOM_CTX_SEED=32db8d22914609b5d53f87f6947ca780fea50790b141c9ccc134e7a13f971121
2025-02-07T19:59:17.388318Z [test] [init/common.cpp:153] [LogPackageVersion] Bitcoin Core version v28.99.0-553a7e2f6664-dirty (release build)
2025-02-07T19:59:17.388468Z [test] [kernel/context.cpp:20] [operator()] Using the 'arm_shani(1way,2way)' SHA256 implementation
2025-02-07T19:59:17.397870Z [test] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/b-test-58682} IPC client first request from current thread, constructing waiter
2025-02-07T19:59:17.398037Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/b-test-58682} IPC client send FooInterface.add$Params (a = 1, b = 2)
2025-02-07T19:59:17.398114Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} IPC server recv request  #1 FooInterface.add$Params (a = 1, b = 2)
2025-02-07T19:59:17.398144Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} IPC server send response #1 FooInterface.add$Results (result = 3)
2025-02-07T19:59:17.398166Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/b-test-58682} IPC client recv FooInterface.add$Results (result = 3)
2025-02-07T19:59:17.398206Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/b-test-58682} IPC client send FooInterface.passOutPoint$Params (arg = "d\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\310\\000\\000\\000")
2025-02-07T19:59:17.398231Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} IPC server recv request  #2 FooInterface.passOutPoint$Params (arg = "d\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\310\\000\\000\\000")
2025-02-07T19:59:17.398244Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} IPC server send response #2 FooInterface.passOutPoint$Results (result = "d\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\310\\000\\000\\000")
2025-02-07T19:59:17.398262Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/b-test-58682} IPC client recv FooInterface.passOutPoint$Results (result = "d\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\310\\000\\000\\000")
2025-02-07T19:59:17.398306Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/b-test-58682} IPC client send FooInterface.passUniValue$Params (arg = "{\\"i\\":1,\\"s\\":\\"two\\"}")
2025-02-07T19:59:17.398324Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} IPC server recv request  #3 FooInterface.passUniValue$Params (arg = "{\\"i\\":1,\\"s\\":\\"two\\"}")
2025-02-07T19:59:17.398338Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} IPC server send response #3 FooInterface.passUniValue$Results (result = "{\\"i\\":1,\\"s\\":\\"two\\"}")
2025-02-07T19:59:17.398354Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/b-test-58682} IPC client recv FooInterface.passUniValue$Results (result = "{\\"i\\":1,\\"s\\":\\"two\\"}")
2025-02-07T19:59:17.398384Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/b-test-58682} IPC client send FooInterface.passTransaction$Params (arg = "\\002\\000\\000\\000\\001d\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\310\\000\\000\\000\\000\\377\\377\\377\\377\\001\\000\\341\\365\\005\\000\\000\\000\\000\\000\\003\\000\\000\\000")
2025-02-07T19:59:17.398404Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} IPC server recv request  #4 FooInterface.passTransaction$Params (arg = "\\002\\000\\000\\000\\001d\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\310\\000\\000\\000\\000\\377\\377\\377\\377\\001\\000\\341\\365\\005\\000\\000\\000\\000\\000\\003\\000\\000\\000")
2025-02-07T19:59:17.398448Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} IPC server send response #4 FooInterface.passTransaction$Results (result = "\\002\\000\\000\\000\\001d\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\310\\000\\000\\000\\000\\377\\377\\377\\377\\001\\000\\341\\365\\005\\000\\000\\000\\000\\000\\003\\000\\000\\000")
2025-02-07T19:59:17.398467Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/b-test-58682} IPC client recv FooInterface.passTransaction$Results (result = "\\002\\000\\000\\000\\001d\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\000\\310\\000\\000\\000\\000\\377\\377\\377\\377\\001\\000\\341\\365\\005\\000\\000\\000\\000\\000\\003\\000\\000\\000")
2025-02-07T19:59:17.398501Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/b-test-58682} IPC client send FooInterface.passVectorChar$Params (arg = "Hello")
2025-02-07T19:59:17.398516Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} IPC server recv request  #5 FooInterface.passVectorChar$Params (arg = "Hello")
2025-02-07T19:59:17.398525Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} IPC server send response #5 FooInterface.passVectorChar$Results (result = "Hello")
2025-02-07T19:59:17.398538Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/b-test-58682} IPC client recv FooInterface.passVectorChar$Results (result = "Hello")
2025-02-07T19:59:17.398663Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/b-test-58682} IPC client send FooInterface.passBlockState$Params (arg = (mode = 1, result = 8, rejectReason = "reject reason", debugMessage = "debug message"))
2025-02-07T19:59:17.398704Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} IPC server recv request  #6 FooInterface.passBlockState$Params (arg = (mode = 1, result = 8, rejectReason = "reject reason", debugMessage = "debug message"))
2025-02-07T19:59:17.398725Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} IPC server send response #6 FooInterface.passBlockState$Results (result = (mode = 1, result = 8, rejectReason = "reject reason", debugMessage = "debug message"))
2025-02-07T19:59:17.398749Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/b-test-58682} IPC client recv FooInterface.passBlockState$Results (result = (mode = 1, result = 8, rejectReason = "reject reason", debugMessage = "debug message"))
2025-02-07T19:59:17.398772Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/b-test-58682} IPC client send FooInterface.passBlockState$Params (arg = (mode = 0, result = 0, rejectReason = "", debugMessage = ""))
2025-02-07T19:59:17.398790Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} IPC server recv request  #7 FooInterface.passBlockState$Params (arg = (mode = 0, result = 0, rejectReason = "", debugMessage = ""))
2025-02-07T19:59:17.398801Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} IPC server send response #7 FooInterface.passBlockState$Results (result = (mode = 0, result = 0, rejectReason = "", debugMessage = ""))
2025-02-07T19:59:17.398819Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/b-test-58682} IPC client recv FooInterface.passBlockState$Results (result = (mode = 0, result = 0, rejectReason = "", debugMessage = ""))
2025-02-07T19:59:17.398843Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/b-test-58682} IPC client send FooInterface.passScript$Params (arg = "[")
2025-02-07T19:59:17.398858Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} IPC server recv request  #8 FooInterface.passScript$Params (arg = "[")
2025-02-07T19:59:17.398868Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} IPC server send response #8 FooInterface.passScript$Results (result = "[")
2025-02-07T19:59:17.398883Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/b-test-58682} IPC client recv FooInterface.passScript$Results (result = "[")
2025-02-07T19:59:17.399029Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} IPC server destroy N2mp11ProxyServerIN3gen12FooInterfaceEEE
2025-02-07T19:59:17.399089Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} EventLoop::loop done, cancelling event listeners.
2025-02-07T19:59:17.399098Z [unknown] [test/ipc_test.cpp:60] [operator()] LOG0: {IpcPipeTest-24701/58689} EventLoop::loop bye.

https://github.com/bitcoin/bitcoin/actions/runs/13206947916/job/36872233303?pr=30975

Currently, there are two cases where code may attempt to lock `EventLoop::mutex` after the `EventLoop::loop()` method has finished running and the `EventLoop` object is potentially destroyed. Both cases happen due to uses of the `Unlock()` utility function which unlocks a mutex temporarily, runs a callback function and relocks it. The first case happens in `EventLoop::removeClient` when the `EventLoop::done()` condition is reach and it calls `Unlock()` in order to be able to call `write(post_fd, ...)` without blocking and wake the EventLoop thread. In this case, since `EventLoop` execution is done there is not really any point to using `Unlock()` and relocking the mutex after calling `write()` so the code is updated to just use a simple `lock.unlock()` call and permanently let go of the lock and try to reacquire it. The second case happens in `EventLoop::post` where `Unlock()` is also used in a similar way, and depending on thread scheduling (if the post thread is delayed for a long time before relocking) can result in relocking `EventLoop::m_mutex` after calling `write()` to fail. In this case, since relocking the mutex is actually necessary for the code that follows, the fix is different: new `addClient`/`removeClient` calls are just added to this code, so the `EventLoop::loop()` function will just not exit while the `post()` function is waiting. These two changes are being labeled as a bugfix even though there is not technically a bug here in libmultiprocess code or API. The `EventLoop` object itself was perfectly safe before these changes as long as outside code was waited for `EventLoop` methods to finish executing before deleting the `EventLoop` object. Originally the multiprocess code added in bitcoin/bitcoin#10102 did this, but later as more features were added to binding and connecting to unix sockets, and unit tests were added which would immediately delete the `EventLoop` object after `EventLoop::loop()` returned it meant these two methods could start failing due to their uses of `Unlock()` depending on thread scheduling. A previous attempt was made to fix this bug in bitcoin/bitcoin#31815 outside of libmultiprocess code. But it only addressed the first case of a failing `Unlock()` in `EventLoop::removeClient`, not the second case in `EventLoop::post`. This first case described above was not actually observed but is just theoretically possible. The second case happens intermittently on macos and was reported bitcoin-core#154

Currently, there are two cases where code may attempt to lock `EventLoop::mutex` after the `EventLoop::loop()` method has finished running. This is not a problem by itself, but is a problem if there is external code which deletes the `EventLoop` object immediately after the method returns, which is the case in Bitcoin Core. Both cases of locking after the loop method returns happen due to uses of the `Unlock()` utility function which unlocks a mutex temporarily, runs a callback function and relocks it. The first case happens in `EventLoop::removeClient` when the `EventLoop::done()` condition is reached and it calls `Unlock()` in order to be able to call `write(post_fd, ...)` without blocking and wake the EventLoop thread. In this case, since `EventLoop` execution is done there is not really any point to using `Unlock()` and relocking, so the code is changed to use a simple `lock.unlock()` call and not try to relock. The second case happens in `EventLoop::post` where `Unlock()` is used in a similar way, and depending on thread scheduling (if the thread is delayed for a long time before relocking) can result in failing to relock `EventLoop::m_mutex` after calling `write()`. In this case, since relocking the mutex is actually necessary for the code that follows, the fix is different: new `addClient`/`removeClient` calls are added to this code, so the `EventLoop::loop()` function will never exit while the `post()` function is waiting. These two changes are being labeled as a bugfix even though there is not technically a bug here in the libmultiprocess code or API. The `EventLoop` object itself was safe before these changes as long as outside code waited for `EventLoop` methods to finish executing before deleting the `EventLoop` object. Originally the multiprocess code added in bitcoin/bitcoin#10102 did this, but later as more features were added for binding and connecting to unix sockets, and unit tests were added which would immediately delete the `EventLoop` object after `EventLoop::loop()` returned, it meant this code could now start failing to relock after unlocking. A previous attempt was made to fix this bug in bitcoin/bitcoin#31815 outside of libmultiprocess code. But it only addressed the first case of a failing `Unlock()` in `EventLoop::removeClient()`, not the second case in `EventLoop::post()`. This first case described above was not observed but is theoretically possible. The second case happens intermittently on macos and was reported bitcoin-core#154

f8cbd0d bugfix: Do not lock EventLoop::mutex after EventLoop is done (user) Pull request description: Currently, there are two cases where code may attempt to lock `EventLoop::mutex` after the `EventLoop::loop()` method has finished running. This is not a problem by itself, but is a problem if there is external code which deletes the `EventLoop` object immediately after the method returns, which is the case in Bitcoin Core. Both cases of locking after the loop method returns happen due to uses of the `Unlock()` utility function which unlocks a mutex temporarily, runs a callback function and relocks it. The first case happens in `EventLoop::removeClient` when the `EventLoop::done()` condition is reached and it calls `Unlock()` in order to be able to call `write(post_fd, ...)` without blocking and wake the EventLoop thread. In this case, since `EventLoop` execution is done there is not really any point to using `Unlock()` and relocking, so the code is changed to use a simple `lock.unlock()` call and not try to relock. The second case happens in `EventLoop::post` where `Unlock()` is used in a similar way, and depending on thread scheduling (if the thread is delayed for a long time before relocking) can result in failing to relock `EventLoop::m_mutex` after calling `write()`. In this case, since relocking the mutex is actually necessary for the code that follows, the fix is different: new `addClient`/`removeClient` calls are added to this code, so the `EventLoop::loop()` function will never exit while the `post()` function is waiting. These two changes are being labeled as a bugfix even though there is not technically a bug here in the libmultiprocess code or API. The `EventLoop` object itself was safe before these changes as long as outside code waited for `EventLoop` methods to finish executing before deleting the `EventLoop` object. Originally the multiprocess code added in bitcoin/bitcoin#10102 did this, but later as more features were added for binding and connecting to unix sockets, and unit tests were added which would immediately delete the `EventLoop` object after `EventLoop::loop()` returned, it meant this code could now start failing to relock after unlocking. A previous attempt was made to fix this bug in bitcoin/bitcoin#31815 outside of libmultiprocess code. But it only addressed the first case of a failing `Unlock()` in `EventLoop::removeClient()`, not the second case in `EventLoop::post()`. This first case described above was not observed but is theoretically possible. The second case happens intermittently on macos and was reported #154 Top commit has no ACKs. Tree-SHA512: 378d360de84b40f0ddcf2ebc5e11fb74adf6d133e94f148ebb7e0c9926836f276ac79ed1d8cb1a559a7aa756939071e8d252d4f03fe6816760807857fba646cb

Currently, there are two cases where code may attempt to lock `EventLoop::mutex` after the `EventLoop::loop()` method has finished running. This is not a problem by itself, but is a problem if there is external code which deletes the `EventLoop` object immediately after the method returns, which is the case in Bitcoin Core. Both cases of locking after the loop method returns happen due to uses of the `Unlock()` utility function which unlocks a mutex temporarily, runs a callback function and relocks it. The first case happens in `EventLoop::removeClient` when the `EventLoop::done()` condition is reached and it calls `Unlock()` in order to be able to call `write(post_fd, ...)` without blocking and wake the EventLoop thread. In this case, since `EventLoop` execution is done there is not really any point to using `Unlock()` and relocking, so the code is changed to use a simple `lock.unlock()` call and not try to relock. The second case happens in `EventLoop::post` where `Unlock()` is used in a similar way, and depending on thread scheduling (if the thread is delayed for a long time before relocking) can result in failing to relock `EventLoop::m_mutex` after calling `write()`. In this case, since relocking the mutex is actually necessary for the code that follows, the fix is different: new `addClient`/`removeClient` calls are added to this code, so the `EventLoop::loop()` function will never exit while the `post()` function is waiting. These two changes are being labeled as a bugfix even though there is not technically a bug here in the libmultiprocess code or API. The `EventLoop` object itself was safe before these changes as long as outside code waited for `EventLoop` methods to finish executing before deleting the `EventLoop` object. Originally the multiprocess code added in bitcoin/bitcoin#10102 did this, but later as more features were added for binding and connecting to unix sockets, and unit tests were added which would immediately delete the `EventLoop` object after `EventLoop::loop()` returned, it meant this code could now start failing to relock after unlocking. A previous attempt was made to fix this bug in bitcoin/bitcoin#31815 outside of libmultiprocess code. But it only addressed the first case of a failing `Unlock()` in `EventLoop::removeClient()`, not the second case in `EventLoop::post()`. This first case described above was not observed but is theoretically possible. The second case happens intermittently on macos and was reported bitcoin-core#154

48d01bc bugfix: Do not lock EventLoop::mutex after EventLoop is done (Ryan Ofsky) Pull request description: Currently, there are two cases where code may attempt to lock `EventLoop::mutex` after the `EventLoop::loop()` method has finished running. This is not a problem by itself, but is a problem if there is external code which deletes the `EventLoop` object immediately after the method returns, which is the case in Bitcoin Core. Both cases of locking after the loop method returns happen due to uses of the `Unlock()` utility function which unlocks a mutex temporarily, runs a callback function and relocks it. The first case happens in `EventLoop::removeClient` when the `EventLoop::done()` condition is reached and it calls `Unlock()` in order to be able to call `write(post_fd, ...)` without blocking and wake the EventLoop thread. In this case, since `EventLoop` execution is done there is not really any point to using `Unlock()` and relocking, so the code is changed to use a simple `lock.unlock()` call and not try to relock. The second case happens in `EventLoop::post` where `Unlock()` is used in a similar way, and depending on thread scheduling (if the thread is delayed for a long time before relocking) can result in failing to relock `EventLoop::m_mutex` after calling `write()`. In this case, since relocking the mutex is actually necessary for the code that follows, the fix is different: new `addClient`/`removeClient` calls are added to this code, so the `EventLoop::loop()` function will never exit while the `post()` function is waiting. These two changes are being labeled as a bugfix even though there is not technically a bug here in the libmultiprocess code or API. The `EventLoop` object itself was safe before these changes as long as outside code waited for `EventLoop` methods to finish executing before deleting the `EventLoop` object. Originally the multiprocess code added in bitcoin/bitcoin#10102 did this, but later as more features were added for binding and connecting to unix sockets, and unit tests were added which would immediately delete the `EventLoop` object after `EventLoop::loop()` returned, it meant this code could now start failing to relock after unlocking. A previous attempt was made to fix this bug in bitcoin/bitcoin#31815 outside of libmultiprocess code. But it only addressed the first case of a failing `Unlock()` in `EventLoop::removeClient()`, not the second case in `EventLoop::post()`. This first case described above was not observed but is theoretically possible. The second case happens intermittently on macos and was reported #154 Top commit has no ACKs. Tree-SHA512: 378d360de84b40f0ddcf2ebc5e11fb74adf6d133e94f148ebb7e0c9926836f276ac79ed1d8cb1a559a7aa756939071e8d252d4f03fe6816760807857fba646cb

ryanofsky · 2025-02-10T05:07:14Z

I reproduced the bug locally (see bitcoin-core/libmultiprocess#154 (comment)) and implemented a different fix in bitcoin-core/libmultiprocess#159, which should make these changes unnecessary, because instead of trying to delay eventloop destruction in Bitcoin code until all locks are released, it just avoids reacquiring a lock unnecessary and pretents eventloop::loop() from returning until all locks are released so it won't be destroyed. Will close this PR since these changes only incompletely fixed the bug and should no longer be needed.

Sjors · 2025-02-10T08:28:22Z

Thanks. I dropped this from #30975 and rebased on your latest #31741 which contains bitcoin-core/libmultiprocess#159.

This was referenced Feb 7, 2025

Multiprocess tracking issue #28722

Open

Bitcoin Core rpc_misc.py test failure bitcoin-core/libmultiprocess#128

Closed

Occasional ipc_test failures (macOS?) bitcoin-core/libmultiprocess#154

Closed

maflcko added Bug and removed Bug labels Feb 7, 2025

DrahtBot added the interfaces label Feb 7, 2025

Sjors mentioned this pull request Feb 7, 2025

ci: build multiprocess on most jobs #30975

Closed

ryanofsky mentioned this pull request Feb 10, 2025

[deleted] bitcoin-core/libmultiprocess#158

Merged

ryanofsky mentioned this pull request Feb 10, 2025

bugfix: Do not lock EventLoop::mutex after EventLoop is done bitcoin-core/libmultiprocess#159

Merged

ryanofsky closed this Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

multiprocess: Lock CapnpProtocol::m_loop with mutex #31815

multiprocess: Lock CapnpProtocol::m_loop with mutex #31815

Uh oh!

ryanofsky commented Feb 7, 2025 •

edited

Loading

Uh oh!

DrahtBot commented Feb 7, 2025 •

edited

Loading

Uh oh!

ryanofsky commented Feb 7, 2025

Uh oh!

Sjors commented Feb 7, 2025 •

edited

Loading

Uh oh!

ryanofsky commented Feb 10, 2025

Uh oh!

Sjors commented Feb 10, 2025

Uh oh!

Uh oh!

multiprocess: Lock CapnpProtocol::m_loop with mutex #31815

multiprocess: Lock CapnpProtocol::m_loop with mutex #31815

Uh oh!

Conversation

ryanofsky commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Uh oh!

ryanofsky commented Feb 7, 2025

Uh oh!

Sjors commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryanofsky commented Feb 10, 2025

Uh oh!

Sjors commented Feb 10, 2025

Uh oh!

Uh oh!

ryanofsky commented Feb 7, 2025 •

edited

Loading

DrahtBot commented Feb 7, 2025 •

edited

Loading

Sjors commented Feb 7, 2025 •

edited

Loading