Skip to content

Conversation

fanquake
Copy link
Member

@fanquake fanquake commented Jan 28, 2021

After #21016, boost::shared_mutex will basically be the last component of Boost Thread that Bitcoin Core is using. However, a comment in #16684, pointed out that std::shared_mutex may be unsafe to use, when coupled with a range of glibc versions ~2.26->2.29, which may block our adoption and the removal of Boost Thread.

Note that the comment first links to "rdlock stalls indefinitely on an unlocked pthread rwlock", but then to the Ubuntu bugtracker, which is actually for a different pthread bug: "pthread_rwlock_trywrlock results in hang". This PR contains a modified version of the code to reproduce the second bug, for which a backport was done for Ubuntu 18.04s glibc (2.27-3ubuntu1.3).

You can reproduce the hanging behaviour using the following. (You could also use any demos from the linked bug reports).

Run a Bionic container, and install build dependencies. Clone the source and checkout this branch:

docker run -it --rm ubuntu:18.04 /bin/bash

apt update && apt upgrade -y
apt install git build-essential libtool autotools-dev automake pkg-config bsdmainutils python3 libevent-dev libboost-system-dev libboost-filesystem-dev libboost-test-dev libboost-thread-dev -y

git clone https://github.com/bitcoin/bitcoin
git fetch origin pull/21022/head:21022
git checkout 21022

Configure, disabling the wallet (unrelated) and enabling glibc back compat, so that our sanity checks are enabled:

./autogen.sh
./configure --disable-wallet --enable-glibc-back-compat
make src/bitcoind -j8

Before running bitcoind, check which version of libc you have installed. 23844 was fixed in 1.3, so running 1.4 should mean no issues:

apt-cache policy libc6
libc6:
  Installed: 2.27-3ubuntu1.4
  Candidate: 2.27-3ubuntu1.4

Run bitcoind

src/bitcoind
Testing pthread trylock_wr

2021-01-28T13:24:39Z Bitcoin Core version v21.99.0-1e601c9d60f9 (release build)
2021-01-28T13:24:39Z Assuming ancestors of block 0000000000000000000b9d2ec5a352ecba0592946514a92f14319dc2b367fc72 have valid signatures.
2021-01-28T13:24:39Z Setting nMinimumChainWork=00000000000000000000000000000000000000001533efd8d716a517fe2c5008
...
# quit

Downgrade libc to 1.2. You may need to uninstall libc6-dev first, otherwise it will block the downgrade of libc6:

apt remove libc6-dev -y
apt install libc6=2.27-3ubuntu1.2 -y

# check that you're running 1.2
apt-cache policy libc6
libc6:
  Installed: 2.27-3ubuntu1.2
  Candidate: 2.27-3ubuntu1.4

Run bitcoind again. Note that this will hang. You will not be able to quit.

# if it doesn't hang, quit and retry
./src/bitcoind
Testing pthread trylock_wr

.... hanging

I think it would be good to have more discussion around how we approach these sort of "lower down the stack" issues:

  • what sort of bug or issue warrants us from excluding a std library feature, i.e std::shared_mutex, from use? If you looked at the glibc issue tracker right now, you'd no doubt find half a dozen bugs across multiple versions of glibc that you may think warrant us excluding various things.
  • How widespread does the issue have to be? If it's in a single glibc version that is nearly EOL, that is obviously less of an issue compared to something affecting most recent versions.
  • how (in)tolerant are we of assuming that users are taking backports?
  • do we need to be more aggressive with runtime sanity checks? Should we be trying to detect more "broken things" that are applicable to us?

It would certainly be unfortunate if migrating to more standard library components was blocked for extended periods (Ubuntu Bionic is LTS until mid 2023), due to these kinds of bugs.

@laanwj
Copy link
Member

laanwj commented Jan 28, 2021

may be unsafe to use, when coupled with a range of glibc versions ~2.26->2.29, which may block our adoption and the removal of Boost Thread.

plugs #18110

Edit: with regard to shared_mutex specifically, these are the only uses, of which one is in the tests:

src/script/sigcache.cpp:#include <boost/thread/shared_mutex.hpp>
src/script/sigcache.cpp:    boost::shared_mutex cs_sigcache;
src/script/sigcache.cpp:        boost::shared_lock<boost::shared_mutex> lock(cs_sigcache);
src/script/sigcache.cpp:        boost::unique_lock<boost::shared_mutex> lock(cs_sigcache);
src/test/cuckoocache_tests.cpp:    boost::shared_mutex mtx;
src/test/cuckoocache_tests.cpp:        boost::unique_lock<boost::shared_mutex> l(mtx);
src/test/cuckoocache_tests.cpp:            boost::shared_lock<boost::shared_mutex> l(mtx);
src/test/cuckoocache_tests.cpp:    boost::unique_lock<boost::shared_mutex> l(mtx);

Can't we get rid of it? That's resolve this particular issue at least (though not the wider discussion around glibc versions).

@theuni
Copy link
Member

theuni commented Jan 28, 2021

plugs #18110

I know this is really pushing it, but I don't think it's totally crazy to consider vendoring musl in the very long term (not now).

Certainly building against it for releases is worth considering.

Can't we get rid of it? That's resolve this particular issue at least (though not the wider discussion around glibc versions).

Without looking at the code (assuming it wouldn't require a redesign), this seems like a reasonable temporary punt. It also seems reasonable to me to worry that boost hits the same underlying glibc bug, and we've just not happened to trigger the right conditions yet.

Shared locks are really handy to have sometimes, though, so we'll definitely want a path back to enabling them if we take that route. @fanquake's approach here makes perfect sense to me in that regard.

@jnewbery
Copy link
Contributor

jnewbery commented Feb 2, 2021

Can't we get rid of it? That's resolve this particular issue at least (though not the wider discussion around glibc versions).

Without looking at the code (assuming it wouldn't require a redesign), this seems like a reasonable temporary punt.

I think this is actually very reasonable. The shared_mutex in the sigcache was introduced in ef0f422. @sipa's commit log:

Remove contention on signature cache during block validation

Since block validation happens in parallel, multiple threads may be
accessing the signature cache simultaneously. To prevent contention:
* Turn the signature cache lock into a shared mutex
* Make reading from the cache only acquire a shared lock
* Let block validations not store their results in the cache

I believe that since #10192 was merged, block validation will now very rarely use the signature cache. From that PR:

This is somewhat duplicative with the sigcache, as entries in the
new cache will also have several entries in the sigcache. However,
the sigcache is kept both as ATMP relies on it and because it
prevents malleability-based DoS attacks on the new higher-level
cache. Instead, the -sigcachesize option is re-used - cutting the
sigcache size in half and using the newly freed memory for the
script execution cache.

If the only use for sigcache is now in ATMP (which is single-threaded and requires cs_main), then changing this to a simple mutex wouldn't have any impact on performance. I don't understand the "prevents malleability-based DoS attacks on the new higher-level cache" part of @TheBlueMatt's description so I may be missing something subtle here.

laanwj added a commit to bitcoin-core/gui that referenced this pull request Feb 12, 2021
060a2a6 ci: remove boost thread installation (fanquake)
06e1d7d build: don't build or use Boost Thread (fanquake)
7097add refactor: replace Boost shared_mutex with std shared_mutex in sigcache (fanquake)
8e55981 refactor: replace Boost shared_mutex with std shared_mutex in cuckoocache tests (fanquake)

Pull request description:

  This replaces `boost::shared_mutex` and `boost::unique_lock` with [`std::shared_mutex`](https://en.cppreference.com/w/cpp/thread/shared_mutex) & [`std::unique_lock`](https://en.cppreference.com/w/cpp/thread/unique_lock).

  Even though [some concerns were raised](bitcoin/bitcoin#16684 (comment)) in #16684 with regard to `std::shared_mutex` being unsafe to use across some glibc versions, I still think this change is an improvement. As I mentioned in #21022, I also think trying to restrict standard library feature usage based on bugs in glibc is not only hard to do, but it's not currently clear exactly how we do that in practice (does it also extend to patching out use in our dependencies, should we be implementing more runtime checks for features we are using, when do we consider an affected glibc "old enough" not to worry about? etc). If you take a look through the [glibc bug tracker](https://sourceware.org/bugzilla/describecomponents.cgi?product=glibc) you'll no doubt find plenty of (active) bug reports for standard library code we already using. Obviously not to say we shouldn't try and avoid buggy code where possible.

  Two other points:

  [Cory mentioned in #21022](bitcoin/bitcoin#21022 (comment)):
  > It also seems reasonable to me to worry that boost hits the same underlying glibc bug, and we've just not happened to trigger the right conditions yet.

  Moving away from Boost to the standard library also removes the potential for differences related to Boosts configuration. Boost has multiple versions of `shared_mutex`, and what you end up using, and what it's backed by depends on:
  * The version of Boost.
  * The platform you're building for.
  * Which version of `BOOST_THREAD_VERSION` is defined: (2,3,4 or 5) default=2. (see [here](https://www.boost.org/doc/libs/1_70_0/doc/html/thread/build.html#thread.build.configuration) for some of the differences).
  * Is `BOOST_THREAD_V2_SHARED_MUTEX` defined? (not by default). If so, you might get the ["less performant, but more robust"](boostorg/thread#230 (comment)) version of `shared_mutex`.

  A lot of these factors are eliminated by our use of depends, but users will have varying configurations. It's also not inconceivable to think that a distro, or some package manager might start defining something like `BOOST_THREAD_VERSION=3`. Boost tried to change the default from 2 to 3 at one point.

  With this change, we no longer use Boost Thread, so this PR also removes it from depends, the build system, CI etc.

  Previous similar PRs were #19183 & #20922. The authors are included in the commits here.
  Also related to #21022 - pthread sanity checking.

ACKs for top commit:
  laanwj:
    Code review ACK 060a2a6
  vasild:
    ACK 060a2a6

Tree-SHA512: 572d14d8c9de20bc434511f20d3f431836393ff915b2fe9de5a47a02dca76805ad5c3fc4cceecb4cd43f3ba939a0508178c4e60e62abdbaaa6b3e8db20b75b03
@fanquake
Copy link
Member Author

Closing this now that #21064 has been merged, however I think there is still scope for us to increase our runtime sanity checks going forward. Will propose some other changes.

@fanquake fanquake closed this Feb 15, 2021
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants