-
Notifications
You must be signed in to change notification settings - Fork 37.7k
[Not for merge] pthread sanity check #21022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
plugs #18110 Edit: with regard to
Can't we get rid of it? That's resolve this particular issue at least (though not the wider discussion around glibc versions). |
I know this is really pushing it, but I don't think it's totally crazy to consider vendoring musl in the very long term (not now). Certainly building against it for releases is worth considering.
Without looking at the code (assuming it wouldn't require a redesign), this seems like a reasonable temporary punt. It also seems reasonable to me to worry that boost hits the same underlying glibc bug, and we've just not happened to trigger the right conditions yet. Shared locks are really handy to have sometimes, though, so we'll definitely want a path back to enabling them if we take that route. @fanquake's approach here makes perfect sense to me in that regard. |
I think this is actually very reasonable. The shared_mutex in the sigcache was introduced in ef0f422. @sipa's commit log:
I believe that since #10192 was merged, block validation will now very rarely use the signature cache. From that PR:
If the only use for sigcache is now in ATMP (which is single-threaded and requires cs_main), then changing this to a simple mutex wouldn't have any impact on performance. I don't understand the "prevents malleability-based DoS attacks on the new higher-level cache" part of @TheBlueMatt's description so I may be missing something subtle here. |
060a2a6 ci: remove boost thread installation (fanquake) 06e1d7d build: don't build or use Boost Thread (fanquake) 7097add refactor: replace Boost shared_mutex with std shared_mutex in sigcache (fanquake) 8e55981 refactor: replace Boost shared_mutex with std shared_mutex in cuckoocache tests (fanquake) Pull request description: This replaces `boost::shared_mutex` and `boost::unique_lock` with [`std::shared_mutex`](https://en.cppreference.com/w/cpp/thread/shared_mutex) & [`std::unique_lock`](https://en.cppreference.com/w/cpp/thread/unique_lock). Even though [some concerns were raised](bitcoin/bitcoin#16684 (comment)) in #16684 with regard to `std::shared_mutex` being unsafe to use across some glibc versions, I still think this change is an improvement. As I mentioned in #21022, I also think trying to restrict standard library feature usage based on bugs in glibc is not only hard to do, but it's not currently clear exactly how we do that in practice (does it also extend to patching out use in our dependencies, should we be implementing more runtime checks for features we are using, when do we consider an affected glibc "old enough" not to worry about? etc). If you take a look through the [glibc bug tracker](https://sourceware.org/bugzilla/describecomponents.cgi?product=glibc) you'll no doubt find plenty of (active) bug reports for standard library code we already using. Obviously not to say we shouldn't try and avoid buggy code where possible. Two other points: [Cory mentioned in #21022](bitcoin/bitcoin#21022 (comment)): > It also seems reasonable to me to worry that boost hits the same underlying glibc bug, and we've just not happened to trigger the right conditions yet. Moving away from Boost to the standard library also removes the potential for differences related to Boosts configuration. Boost has multiple versions of `shared_mutex`, and what you end up using, and what it's backed by depends on: * The version of Boost. * The platform you're building for. * Which version of `BOOST_THREAD_VERSION` is defined: (2,3,4 or 5) default=2. (see [here](https://www.boost.org/doc/libs/1_70_0/doc/html/thread/build.html#thread.build.configuration) for some of the differences). * Is `BOOST_THREAD_V2_SHARED_MUTEX` defined? (not by default). If so, you might get the ["less performant, but more robust"](boostorg/thread#230 (comment)) version of `shared_mutex`. A lot of these factors are eliminated by our use of depends, but users will have varying configurations. It's also not inconceivable to think that a distro, or some package manager might start defining something like `BOOST_THREAD_VERSION=3`. Boost tried to change the default from 2 to 3 at one point. With this change, we no longer use Boost Thread, so this PR also removes it from depends, the build system, CI etc. Previous similar PRs were #19183 & #20922. The authors are included in the commits here. Also related to #21022 - pthread sanity checking. ACKs for top commit: laanwj: Code review ACK 060a2a6 vasild: ACK 060a2a6 Tree-SHA512: 572d14d8c9de20bc434511f20d3f431836393ff915b2fe9de5a47a02dca76805ad5c3fc4cceecb4cd43f3ba939a0508178c4e60e62abdbaaa6b3e8db20b75b03
Closing this now that #21064 has been merged, however I think there is still scope for us to increase our runtime sanity checks going forward. Will propose some other changes. |
After #21016,
boost::shared_mutex
will basically be the last component of Boost Thread that Bitcoin Core is using. However, a comment in #16684, pointed out thatstd::shared_mutex
may be unsafe to use, when coupled with a range of glibc versions ~2.26->2.29, which may block our adoption and the removal of Boost Thread.Note that the comment first links to "rdlock stalls indefinitely on an unlocked pthread rwlock", but then to the Ubuntu bugtracker, which is actually for a different pthread bug: "pthread_rwlock_trywrlock results in hang". This PR contains a modified version of the code to reproduce the second bug, for which a backport was done for Ubuntu 18.04s glibc (2.27-3ubuntu1.3).
You can reproduce the hanging behaviour using the following. (You could also use any demos from the linked bug reports).
Run a Bionic container, and install build dependencies. Clone the source and checkout this branch:
docker run -it --rm ubuntu:18.04 /bin/bash apt update && apt upgrade -y apt install git build-essential libtool autotools-dev automake pkg-config bsdmainutils python3 libevent-dev libboost-system-dev libboost-filesystem-dev libboost-test-dev libboost-thread-dev -y git clone https://github.com/bitcoin/bitcoin git fetch origin pull/21022/head:21022 git checkout 21022
Configure, disabling the wallet (unrelated) and enabling glibc back compat, so that our sanity checks are enabled:
Before running bitcoind, check which version of libc you have installed.
23844
was fixed in1.3
, so running1.4
should mean no issues:Run bitcoind
Downgrade libc to
1.2
. You may need to uninstalllibc6-dev
first, otherwise it will block the downgrade oflibc6
:apt remove libc6-dev -y apt install libc6=2.27-3ubuntu1.2 -y # check that you're running 1.2 apt-cache policy libc6 libc6: Installed: 2.27-3ubuntu1.2 Candidate: 2.27-3ubuntu1.4
Run
bitcoind
again. Note that this will hang. You will not be able to quit.# if it doesn't hang, quit and retry ./src/bitcoind Testing pthread trylock_wr .... hanging
I think it would be good to have more discussion around how we approach these sort of "lower down the stack" issues:
std::shared_mutex
, from use? If you looked at the glibc issue tracker right now, you'd no doubt find half a dozen bugs across multiple versions of glibc that you may think warrant us excluding various things.It would certainly be unfortunate if migrating to more standard library components was blocked for extended periods (Ubuntu Bionic is LTS until mid 2023), due to these kinds of bugs.