-
Notifications
You must be signed in to change notification settings - Fork 37.8k
test, bench: Initialize and terminate use of Winsock properly #28486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. |
@sipsorcery Maybe you have any insights about that? |
I'm having trouble understanding the problem here. Could you please explain what the resulting error is? I can understand possibly needing to move |
Looking at your commit that purposefully introduces a bug, it removes a bunch of bench files. I'm guessing one of those happens to call |
Yes, in the |
The error in that github action is:
Which implies to me that something is throwing |
The call chain from here Line 124 in 459272d
If the latter fails, for example, due to uninitialized Winsock, the |
In this branch, I've minimized the problem code to a single line of code, which is: // Comment the next line out to trigger the WSANOTINITIALISED=10093 error.
const auto test_setup = MakeNoLogFileContext<const TestingSetup>(); |
I still don't understand why our CI isn't failing, in the way you claim it should be, if our binaries are meant to be broken. Nothing should be done here, until the root cause of the issue is actually understood. As this either points to some brokenness in master (build system/CI), or your CMake/MSVC branch producing binaries differently to master. |
Our code is incorrect according to Microsoft's docs because the |
Apparently, the underlying problem is not MSVC or CMake specific. I've opened a dedicated issue with steps to reproduce the bug. We might adjust our CI to catch the problem easily. The PR description has been updated. |
ddd8661
to
fd01691
Compare
We shouldn't (have to) do this going forward, because the solution is either to remove/disable Boost Process for Windows, or replace it with something that doesn't undermine our own code. |
fd01691
to
3cb83c8
Compare
The PR has been reworked and its description has been updated.
I agree with the replacement of the Boost.Process and going to pick up @theStack's branch. Until then, it seems appropriate to have a regression test after fixing our own code. And it comes almost for free for us. |
Thanks, but we aren't going to change our code in the way you are suggesting in this PR. External signer should just be disabled for Windows, until there is a replacement that works properly. |
Disabling external signer support is exactly the thing which exposes the fact that our code is broken. See #28940. |
Yes, and we should keep the almost dangerous, poorly maintained code disabled until we replace it with something that has actually been reviewed, and works as we expect it to. Clearly that hasn't been the case previously. Obviously we also need to fix our networking code (as all our code should function without external signer), but I'm not sure the approach here, is a good one, and further complicates test/bench code, while diverging it from how bitcoind functions. Ideally things become more encapsulated in some lower-level networking/init code, that can be called from bitcoind/test/bench/whereever, we shouldn't have to introduce new special code into the test/bench frameworks to initialize sockets on windows. it would also help to elaborate in the PR description what the underlying issue is (Boost ASIO/Process hijacking socket/network setup out from underneath us on Windows). |
What exactly do you mean by "diverging it from how bitcoind functions"?
"Boost ASIO/Process hijacking socket/network setup" is not a cause of the bug, rather a reason why that bug was hidden from us. |
eb572fb
to
89a52fd
Compare
56da7e0
to
7560aa5
Compare
lgtm ACK b22c3ac, didn't review the GHA diff |
Just in case, here is a branch with the reverted first commit. Here are its CI results:
|
I don't understand how this solves the fact that the Boost ASIO code is still in bitcoind, running before main, and calling |
This PR does nothing about the fact you mentioned. It resolves another bug which is explicit when bitcoind does not contain Boost ASIO code, i.e. when building without external wallet support.
I agree with you. In the added "Win64, VS 2022 (ext.signer=OFF)" CI job, there is no Boost ASIO code in |
So then why isn't this PR disabling external signer for Windows? |
I think this is a test-only bugfix change. No one is holding anyone back from disabling external signer on Windows. |
Given what we know, I don't see why we wouldn't just disable it now. If it's going to be disabled, there's also no point in adding the additional CI checks in this PR. |
Because this PR is focused on fixing a bug in our code, not on changing user-available functionality. The question you raised should be discussed in a dedicated issue/PR. |
The additional CI check proves that everything is working with external signer disabled. |
It the second commit is controversial, I guess it can be dropped. However, my understanding is that the first commit is correct and will be needed in any case. |
b22c3ac
to
fd4c6a1
Compare
The second commit has been dropped. |
lgtm ACK fd4c6a1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two non-blocking comments. I think we could improve SetupNetworking()
and CNetCleanup
further, and constrain the code to being Windows only, rather than having it exist for all platforms, but be a no-op on non-windows.
Will merge this shortly, and open a new PR to disable external signer for Windows, if nobody else does.
@@ -88,6 +89,15 @@ std::ostream& operator<<(std::ostream& os, const uint256& num) | |||
return os; | |||
} | |||
|
|||
struct NetworkSetup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably use at least one line of doc explaining it's existence? Given it's only here to facilitate Windows sockets initialization, and doesn't do anything for non-windows platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably use at least one line of doc explaining it's existence?
Mind suggesting one?
Given it's only here to facilitate Windows sockets initialization, and doesn't do anything for non-windows platforms.
With the current design of the SetupNetworking()
, the code difference between platforms looks rather like implementation details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code difference between platforms looks rather like implementation details.
The only code in SetupNetworking
is Windows code to call WSAStartup()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct.
@@ -1218,6 +1218,9 @@ BOOST_AUTO_TEST_CASE(test_LockDirectory) | |||
// has released the lock as we would expect by probing it. | |||
int processstatus; | |||
BOOST_CHECK_EQUAL(write(fd[1], &LockCommand, 1), 1); | |||
// The following line invokes the ~CNetCleanup dtor without | |||
// a paired SetupNetworking call. This is acceptable as long as | |||
// ~CNetCleanup is a no-op for non-Windows platforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this now only because of the change above? It's not obvious why calling write()
would invoke a NetCleanup destructor, or why we end up in a state where we are trying to tear down networking, if it's never even been setup. In general it seems fragile to rely on ifdefs and code being a no-op to avoid additional issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this now only because of the change above?
This PR diff does not change behaviour of the commented LOC. However, it seems worth mentioning that currently the SetupNetworking
and ~CNetCleanup
calls might be unbalanced. Perhaps, we might consider moving network initialization after daemonizing? But on platforms that support daemonizing, both calls are no-op...
It's not obvious why calling
write()
would invoke a NetCleanup destructor...
Actually, it terminates the process with all consequences, including ~CNetCleanup
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, it seems worth mentioning that currently the SetupNetworking and ~CNetCleanup calls might be unbalanced.
Maybe, but SetupNetworking
and CNetCleanup
only do anything on Windows, and the code here only runs for non-windows, where "unbalanced" calls to either doesn't matter, because there is no code being executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...
SetupNetworking
andCNetCleanup
only do anything on Windows, and the code here only runs for non-windows, where "unbalanced" calls to either doesn't matter, because there is no code being executed.
That is similar to what the comment says, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my point is that we shouldn't need a comment like this, or it's not clear what value it provides. I would think it makes more sense to try and fix things so that we just don't have situations where we try and call network teardown code if we haven't setup any networking, rather than adding comments about Windows for code that only runs on Linux.
It's come to light that Boost ASIO (a Boost Process sub dep) has in some instances, been queitly initialising our network stack on Windows (see PR bitcoin#28486 and discussion in bitcoin#28940). This has been shielding a bug in our own code, but the larger issue is that Boost Process/ASIO is running code before main, and doing things like setting up networking. This undermines our own assumptions about how our binary works, happens before we get to run any sanity checks, and also runs before we call our own code to setup networking. It's also not clear why a feature like external signer would have a dependency that would be doing anything network/socket related, given it only exists to spawn a local process.
It's come to light that Boost ASIO (a Boost Process sub dep) has in some instances, been queitly initialising our network stack on Windows (see PR bitcoin#28486 and discussion in bitcoin#28940). This has been shielding a bug in our own code, but the larger issue is that Boost Process/ASIO is running code before main, and doing things like setting up networking. This undermines our own assumptions about how our binary works, happens before we get to run any sanity checks, and also runs before we call our own code to setup networking. It's also not clear why a feature like external signer would have a dependency that would be doing anything network/socket related, given it only exists to spawn a local process.
It's come to light that Boost ASIO (a Boost Process sub dep) has in some instances, been queitly initialising our network stack on Windows (see PR bitcoin#28486 and discussion in bitcoin#28940). This has been shielding a bug in our own code, but the larger issue is that Boost Process/ASIO is running code before main, and doing things like setting up networking. This undermines our own assumptions about how our binary works, happens before we get to run any sanity checks, and also runs before we call our own code to setup networking. It's also not clear why a feature like external signer would have a dependency that would be doing anything network/socket related, given it only exists to spawn a local process.
It's come to light that Boost ASIO (a Boost Process sub dep) has in some instances, been queitly initialising our network stack on Windows (see PR bitcoin#28486 and discussion in bitcoin#28940). This has been shielding a bug in our own code, but the larger issue is that Boost Process/ASIO is running code before main, and doing things like setting up networking. This undermines our own assumptions about how our binary works, happens before we get to run any sanity checks, and also runs before we call our own code to setup networking. It's also not clear why a feature like external signer would have a dependency that would be doing anything network/socket related, given it only exists to spawn a local process.
It's come to light that Boost ASIO (a Boost Process sub dep) has in some instances, been queitly initialising our network stack on Windows (see PR bitcoin#28486 and discussion in bitcoin#28940). This has been shielding a bug in our own code, but the larger issue is that Boost Process/ASIO is running code before main, and doing things like setting up networking. This undermines our own assumptions about how our binary works, happens before we get to run any sanity checks, and also runs before we call our own code to setup networking. It's also not clear why a feature like external signer would have a dependency that would be doing anything network/socket related, given it only exists to spawn a local process.
…ter" 7b22cd8 Revert "ci: Only run functional tests on windows in master" (Hennadii Stepanov) Pull request description: This PR reverts commit aba4a58 from #28567. The Windows-specific code received [quality](#28486) and [performance](#29045) improvements recently. So there are no reasons to skip functional tests in PRs anymore. In my own repo, I've run the GHA Windows job more than 100 times with no failure. ACKs for top commit: maflcko: lgtm ACK 7b22cd8 TheCharlatan: ACK 7b22cd8 Tree-SHA512: 1e8687e8efe12db506e7cd2d5df9e48b5acb98a339f84684dd0fd67280e22227e2a5a206f1108b09e49038fab7a3ca2ffbd985677f00048c0962b39b2b9a2ba5
308aec3 build: disable external-signer for Windows (fanquake) 3553731 ci: remove --enable-external-signer from win64 job (fanquake) Pull request description: It's come to light that Boost ASIO (a Boost Process sub dep) has in some instances, been quietly initialising our network stack on Windows (see PR bitcoin/bitcoin#28486 and discussion in bitcoin/bitcoin#28940). This has been shielding a bug in our own code, but the larger issue is that Boost Process/ASIO is running code before main, and doing things like setting up networking. This undermines our own assumptions about how our binary works, happens before we run any sanity checks, and before we call our own code to setup networking. Note that ASIO also calls WSAStartup with version `2.0`, whereas we call with `2.2`. It's also not clear why a feature like external signer would have a dependency that would be doing anything network/socket related, given it only exists to spawn a local process. See also the discussion in bitcoin/bitcoin#24907. Note that the maintaince of Boost Process in general, has not really improved. For example, rather than fixing bugs like boostorg/process#111, i.e, boostorg/process#317, the maintainer chooses to just wrap exception causing overflows in try-catch blocks: boostorg/process@0c42a58. These changes get merged in large, unreviewed PRs, i.e boostorg/process#319. This PR disables external-signer on Windows for now. If, in future, someone changes how Boost Process works, or replaces it entirely with some properly reviewed and maintained code, we could reenable this feature on Windows. ACKs for top commit: hebasto: re-ACK 308aec3. TheCharlatan: ACK 308aec3 Tree-SHA512: 7405f7fc9833eeaacd6836c4e5b1c1a7845a40c1fdd55c1060152f8d8189e4777464fde650e11eb1539556a75dddf49667105987078b1457493ee772945da66e
Github-Pull: bitcoin#28486 Rebased-From: fd4c6a1
On the master branch, when compiling without external signer support, the
bench_bitcoin.exe
does not initialize Winsock DLL that is required, for example, here:bitcoin/src/bench/addrman.cpp
Line 124 in 459272d
Moreover, Windows docs explicitly state that
WSAStartup
andWSACleanup
must be balanced:That is not the case for our unit tests because the
SetupNetworking()
call is a part of theBasicTestingSetup
fixture and is invoked multiple times, while~CNetCleanup()
is invoked once only, at the end of the test binary execution.This PR fixes Winsock DLL initialization and termination.
More docs:
Fix #28940.