Skip to content

Conversation

theuni
Copy link
Member

@theuni theuni commented Apr 10, 2014

This reverts #3914 in favor of a more subtle approach, giving us the benefits of shared base libs and the (relative) portability of full-static.

glibc/libstd++ stubs have been added so that older distros don't see undefined symbols. They are bundled together into an --enable-glibc-back-compat configure switch, as I'm not sure there's any real-world usefulness in separating them.

@gmaxwell
Copy link
Contributor

I like this approach much better. Fully statically linking glibc is usually a bad idea, I've spent many an hour tracking down mysterious name resolution failures from static libc.

@wtogami
Copy link
Contributor

wtogami commented Apr 10, 2014

This should also benefit from ASLR, which was not possible with static linked glibc.

@@ -0,0 +1,19 @@
#include "bitcoin-config.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to move these source files to src/compat. Let's keep them separate from the main bitcoin source.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, makes sense.

@laanwj
Copy link
Member

laanwj commented Apr 10, 2014

Static may be a bad idea, but it is the only way to be consistently compatible with a lot of old releases without extra maintenance overhead.

I feel scared importing parts of the C and C++ libraries into bitcoin. Any change in the code may make it necessary to extend glibcxx_compat.cpp / src/glibc_compat.cpp with 'one more function'. If there are implementation problems in those files it will be hard to debug.

Then again, it may be lesser of evils.

Let's at least get this tested on all the platforms people were concerned about in the other topic:

  • Debian 7.2 32-bit and 64 bit
  • Ubuntu 10.04.4 LTS 32-bit and 64 bit
  • CentOS 6.5 32bit and 64 bit

@theuni
Copy link
Member Author

theuni commented Apr 10, 2014

Agreed on the overhead, but I do think that it's the best of all evils. We'd basically just have to keep an eye on it, and update as necessary. As for debugging though, as @gmaxwell pointed out, this makes it substantially easier, not harder.

Anyway, release decisions should be locked in far earlier than they have been in the past. With a sane procedure, we'd know how we stand on final back-compat when shipping the first alpha.

@laanwj
Copy link
Member

laanwj commented Apr 10, 2014

Half related, but more urgent if the gitian-shipped and self-built executables further diverge in implementation: we should make gitian build external debug information files (not to be shipped, just for our own use). This makes it possible to debug crashes in the gitian builds as well as to resolve crash dumps that people send.

Working on test builds...

@theuni
Copy link
Member Author

theuni commented Apr 10, 2014

That's one of the benefits of being deterministic :)

'objcopy --only-keep-debug' on the pre-stripped binaries should be enough to track things down in the future.

@laanwj
Copy link
Member

laanwj commented Apr 10, 2014

Test builds available here:
https://download.visucore.com/bitcoin/bitcoin-f17e2c7-linux.tar.gz
https://download.visucore.com/bitcoin/bitcoin-f17e2c7-linux.tar.gz.sig

If you help testing, please let us know what you tested, and on what OS version and architecture.

glibc/libstdc++ have added new symbols in later releases. When running a new
binary against an older glibc, the run-time linker is unable to resolve the
new symbols and the binary refuses to run.

This can be fixed by adding our own versions of those functions, so that the
build-time linker does not emit undefined symbols for them.

This enables our binary releases to work on older Linux distros, while not
incurring the downsides of a fully static binary.
@theuni
Copy link
Member Author

theuni commented Apr 11, 2014

moved to compat/ and force-pushed, since nothing else changed.

theuni and others added 2 commits April 10, 2014 22:28
Using "./configure --enable-glibc-back-compat" will attempt to be
compatible with a target running glibc abi 2.9 and libstdc++ abi 3.4.
@theuni
Copy link
Member Author

theuni commented Apr 11, 2014

pinging @anton000, @jcea, and @norbertVC since they spoke up in the other PR. Would be great if you could test the binaries from @laanwj.

@anton000
Copy link

@laanwj bins run fine on fully updated CentOS 6.5 64-bit. Currently offshore so cant test 32bit on my box as of yet.

@sipa
Copy link
Member

sipa commented Apr 11, 2014

Do any of the functions reimplemented by this patch actually get called from within Bitcoin? In particular, from within consensus-critical code?

Is there any risk of this code not being exactly compatible with the libc/libstdc++ version the rest of the code is linked with? What if some of the implementation assumptions in std::list change?

@laanwj
Copy link
Member

laanwj commented Apr 11, 2014

Do any of the functions reimplemented by this patch actually get called from within Bitcoin? In particular, from within consensus-critical code?

That would be useful to investigate.

Is there any risk of this code not being exactly compatible with the libc/libstdc++ version the rest of the code is linked with? What if some of the implementation assumptions in std::list change?

A significant change in assumptions would break the ABI. After all, the data structures in subsequent versions need to be binary-compatible: a large part of the C++ library consists of templates and headers which get included into the code.

@sipa
Copy link
Member

sipa commented Apr 11, 2014

To be clear: I like this solution a lot, but I want to understand the risks.


const char* bad_exception::what() const throw()
{
return "std::bad_exception";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That file and the above one don't seem to use our 4 spaces indentation rule.

@theuni
Copy link
Member Author

theuni commented Apr 11, 2014

@sipa obviously memcpy does. But I think that one is pretty self-explanatory.

As for the c++ ones, they have a specific purpose, just that the definitions were moved from the header to the binary implementation. I think it's safe to say that the purpose of those functions won't change. The only danger I can imagine is that they could (for example) change the side-effects of an internal function that we call.

For example, imagine that we need to implement list::insert. We call some of their internal methods like operator[] and iterator++, then we change some internal variables. It's possible that in the future, they could change those variables in the internal functions, so that we've actually changed them twice.

But I don't think that's a very reasonable concern. If you look at what's actually implemented, you'll see that it's so low-level that there's really only one way to handle the function.

Anyway, let's enumerate here so we can narrow down the concern:

_fdelt_warn: Checks for overflows in FD functions. This can be hit (read: this forces a crash) on:

fd_set rfds;
FD_ZERO(&rfds);
FD_SET(FD_SETSIZE+1, &rfds); //BOOM!

I initially had a unit-test for this, but the problem is that we actually want it to crash, and boost tests don't play nice with the idea of a program abort being a passing test. Plus, it spews a really nasty backtrace.

bad_alloc/bad_cast/bad_exception:
These are cosmetic, used for debugging or logging. If they end up displaying new messages in the future, that's no problem. Anything comparing these strings would be horribly broken already.

_List_node_base:
These are related to list->insert or so. They were broken out of the headers and moved into the binary implementation, I believe as part of the c++11 split. These were taken as obvious from upstream, and it's hard to imagine that their meanings could change.

ostream/istream templates: These just force on new templates that are declared 'extern template' in the headers. Seems strange that those worked without c++11 anway. gnu extension, i guess?

out_of_range: This one I'll admit I don't know much about.
I found that another project received permission to copy glibc's implementation verbatim, so I did the same.

If there's any interest in seeing what pulls in a symbol, you can trace it at link-time using -Wl,--trace-symbol.

Also, as to the concern about accidentally introducing new symbols, it would be very easy to teach pull-tester to check for that. Though, obviously, it'd need to be using the same libc/libstdc++ as the release compiler.

@theuni
Copy link
Member Author

theuni commented Apr 11, 2014

Also worth noting that we may be able to drop some of these (the what()s look like good candidates) with smarter linking. Namely -ffunctions-sections + -Wl,--gc-sections, or by enabling lto. Unfortunately, these would need to be used for all dependencies as well since we link statically, so they wouldn't be trivial to pull off.

@theuni
Copy link
Member Author

theuni commented Apr 11, 2014

@Diapolo old habit, will fix.

@theuni
Copy link
Member Author

theuni commented Apr 11, 2014

One last point. The libstdc++ stubs shouldn't be called in older distros anyway. The fact that those symbols aren't present means that their libs don't expect those functions to exist.. they're routed elsewhere. They just need to be present to satisfy the runtime linker.

Point being that the testing of these needs to happen on newer distros, rather than old. For old distros, pass/fail should pretty much cover it.

@theuni
Copy link
Member Author

theuni commented Apr 11, 2014

Err.. above, out-of-range is simple enough, I meant that I hadn't looked deeply into ctype::_M_widen_init.

@theuni
Copy link
Member Author

theuni commented Apr 11, 2014

(last spam here, I'm just trying to hit all of this at once)

See here for the _M_widen_init move: http://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=2dbdf201fad2ea4c13a4826e515dc76cb66f84dd

libstdc++ even implements it several times (eww). So the abi is pretty much locked in.

@BitcoinPullTester
Copy link

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/05c20a553a12d03b1512a75973674c6d25534259 for binaries and test log.
This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/
Contact BlueMatt on freenode if something looks broken.

@laanwj
Copy link
Member

laanwj commented Apr 12, 2014

Compiling everything with -lto would be interesting. As we compile everything including the dependencies from source I expect that there would be quite some savings. On the other hand, the compile time and memory usage juggling all those symbols may make this infeasible. But it's good to keep in mind.

One last point. The libstdc++ stubs shouldn't be called in older distros anyway. The fact that those symbols aren't present means that their libs don't expect those functions to exis

I don't understand this. If our executable uses libstdc++ functions that are not provided by the older libstdc++'s, I would suppose it will still use them on older distros. 'Their' libs don't expect the symbols to exist, but 'our' libs do. Or am I overlooking something?

@theuni
Copy link
Member Author

theuni commented Apr 12, 2014

@laanwj I had a long explanation written up before I realized that you were right and I was wrong :)

You're overlooking how the function is called, but in these cases, it doesn't matter. For example, _M_unhook() is called by _M_erase(). If _M_erase were present in the libstdc++ shared lib, it would call whichever unhook implementation was hard-coded into it. But, as far as I can see, all of our replaced functions are used by inlined callers, so the effect is that it always will use our stubs.

You do seem to have a misconception about how our code uses these stubs, though. Keep in mind that we don't include definitions for these functions. "Our" code never sees them or has any idea that they exist. We just ends up getting routed to them from inside of libstdc++ at runtime.

@kristovatlas
Copy link

@laanwj can this also be tested for debian 6 squeeze? this is what Tails is based on, which is the basis of my interest in this issue, since I want people to easily use Bitcoin Core with Tor.

Edit: Here's the results of running the 32-bit binaries in the latest version of tails (0.23):
...
amnesia@amnesia:/bitcoin-f17e2c7-linux/bin/32$ ./bitcoin-qt
./bitcoin-qt: symbol lookup error: ./bitcoin-qt: undefined symbol: _ZN19QAbstractProxyModel11setItemDataERK11QModelIndexRK4QMapIi8QVariantE
...
amnesia@amnesia:
/bitcoin-f17e2c7-linux/bin/32$ ./test_bitcoin-qt
./test_bitcoin-qt: symbol lookup error: ./test_bitcoin-qt: undefined symbol: _ZN9QLineEdit18setPlaceholderTextERK7QString

Full output here:
http://paste.ubuntu.com/7247453/

@theuni
Copy link
Member Author

theuni commented Apr 14, 2014

@laanwj I'll reserve judgement on the Qt static link since I haven't looked into it before for Linux. I'll take your word for it, but I'm curious so I'll research a bit. Qt (qt5 at least) dyloads GL and friends via plugins, specifically to avoid compile-time dependencies. It may be worth looking into using qt5 for that reason.

Those aren't missing glibc symbols, they're missing qt symbols. The build-side qt is newer than Squeeze's, so it's missing some new functionality. To be more specific, Precise links against qt 4.8, Squeeze uses 4.6.

@laanwj
Copy link
Member

laanwj commented Apr 14, 2014

Last time I tried it was a mess, but you're welcome to try.

@wtogami
Copy link
Contributor

wtogami commented Apr 14, 2014

To be more specific, Precise links against qt 4.8, Squeeze uses 4.6.

If this is the case, then this would not be a regression from the lucid-based gitian builds of Bitcoin 0.8.x. We should not attempt to support things that old.

Isn't Debian stable now 7 or Wheezy?

@quel
Copy link

quel commented Apr 15, 2014

@wtogami Debian stable is currently 7/Wheezy and qt is 4.8. The strangest build requirement for bitcoin on Debian 7 is that we have to forward port/pull db4.8 from Debian 6/Squeeze/oldstable as db5.1 is the default on 7.

@wtogami
Copy link
Contributor

wtogami commented Apr 15, 2014

@quel This statement of fact has nothing to do with the gitian binary.

@quel
Copy link

quel commented Apr 15, 2014

@wtogami I will test the gitian binary on Debian 6 and 7 if I can ever pull it. I'm averaging 15KB/s.

@quel
Copy link

quel commented Apr 15, 2014

@wtogami https://download.visucore.com/bitcoin/bitcoin-f17e2c7-linux.tar.gz with gpg signature verified on Deiban 7.4, x86_64, running bin/64 for bitcoind, bitcoin-cli, and bitcoin-qt run for me without errors. That tar.gz appears to be dated 2014-04-10 08:22:55. Additionally, on a 64-bit platform running the 32-bit binaries works without issue as well. Besides click testing and basic cli testing, it is in no way exhaustive. I am happy to attach ldds, logs, or test additional builds.

@wtogami
Copy link
Contributor

wtogami commented Apr 17, 2014

@theuni Anything else in particular do you think needs testing before merge?

@theuni
Copy link
Member Author

theuni commented Apr 18, 2014

@wtogami For the most part, I think these are pretty much all or nothing. If bitcoind starts up successfully and runs for a few seconds, I don't believe any problems would be likely to surface afterward.

If anyone is uncomfortable with that response, I can look into adding some tests, but they're low level enough that the tests would be something like:

std::list<int> foo, bar;
foo.push_back(1); foo.push_back(2); foo.pop_back();
bar.push_back(1); bar.push_back(2); bar.pop_back();
assert(foo == bar);

Point being that if that failed, it'd be pretty obvious even without the test.

Essentially, we just need to decide if this is the favored approach, and if so, set some sort of policy to deal with changes in the future.

@laanwj
Copy link
Member

laanwj commented Apr 18, 2014

@theuni It would be nice to have at least tests that (indirectly) make use of the more complex functions in the compat/ files, like those in list_node and ctype::_M_widen_init().
I don't care so much about all the errors, exceptions variants.

As for policy, we want to stay compatible with:

  • GLIBC_2.13+
  • GLIBCXX_3.4+

Right? Let's write these things down and put them in some document under doc/...

@theuni
Copy link
Member Author

theuni commented Apr 18, 2014

@laanwj See my comment above about the low-level-ness of these functions. The example above wasn't abstract, that would be the actual test-case for list_node.

As for _M_widen_init(), this is all it takes is this to bring it in:

#include <iostream>
int main()
{
  for(int i=0; i != 2; i++)
    std::cout << std::endl;
}

You can verify that by doing a quick compile and grep:

cory@cory-i7:~/dev/bitcoin/src(libc-compat)$ g++ test.cpp -O1 -o test && objdump -TC test | grep _M_widen_init
0000000000000000      DF *UND*  0000000000000000  GLIBCXX_3.4.11 std::ctype<char>::_M_widen_init() const

So cout (among many other things) would be broken with a borked version of that function. I'm not sure there's really anything to test for that wouldn't be 100% obvious in the resulting binary.

I'm not trying to be difficult on this, I hope it doesn't seem that way. But since we've lifted the definitions of these from upstream, it really does seem (to me, anyway) to be all or nothing.

Any suggestions for alternate tests, or alternative ways of testing?

@laanwj
Copy link
Member

laanwj commented Apr 18, 2014

Low level is fine. If no difficult contortions are needed to bring the symbols in and test them, then that's all the better. Let's add those two examples that you give as sanity tests (maybe use std::iostream instead of std::cout so that no standard output is generated).

@wtogami
Copy link
Contributor

wtogami commented Apr 18, 2014

Lucid is glibc-2.11
RHEL6 is glibc-2.12

@sipa
Copy link
Member

sipa commented Apr 18, 2014

I wonder whether we want something like a start-up test at runtime, rather than unit tests (which wouldn't catch being loaded with an incompatible dynamic library).

It could also do something like a basic (and very short) memory test, or computing some EC identity to check verification correctness. On the other hand, systems that are sufficiently broken to be detected using a subsecond test likely wouldn't even boot...

@theuni How much work/code do you think supporting 2.11 or 2.12 would be extra? Seems it would be very valuable if this can be extended, but of course... extra work and extra code to maintained.

@theuni
Copy link
Member Author

theuni commented Apr 18, 2014

@sipa: This is compatible back to glibc 2.9 and libstdc++ 3.4. Qt is a different story though, I'm afraid, see the discussion above.

I agree that a startup check would be better. I still can't really wrap my head around what a unit-test would actually be able to verify.

Basically every operation at startup would depend on these in some way, so I agree that it likely wouldn't get as far as running anyway. But it couldn't hurt to do a few quick A==A checks explicitly before anything else, rather than possibly racing with sensitive code.

@laanwj
Copy link
Member

laanwj commented Apr 19, 2014

@sipa Good idea on a start-up sanity check.

@theuni Ok let's make the policy then:

  • GLIBC_2.11+
  • GLIBCXX_3.4+

We could even add a script that automatically checks the post-gitian executables if 'forbidden' symbols have been used, hence the compat/ wrappers needs updating.

I don't care about Qt there but only bitcoind/bitcoin-cli. The older stable distributions are almost exclusively used on headless servers.

@gmaxwell
Copy link
Contributor

@sipa ACK on the startup built in self-tests, the fact that e.g. on current fedora the fact that secp256k1 isn't supported in openssl is only 'detected' by a failed assertion deep inside key.cpp when other things run is somewhat concerning to me... but doesn't need to be part of this pull.

@wtogami
Copy link
Contributor

wtogami commented Apr 22, 2014

+1 to doing sanity checks as the next PR. This current PR is ready for merge.

ACK

@laanwj laanwj merged commit 05c20a5 into bitcoin:master Apr 22, 2014
laanwj added a commit that referenced this pull request Apr 22, 2014
05c20a5 build: add symbol for upcoming gcc 4.9's libstdc++ (Cory Fields)
49a3352 gitian-linux: --enable-glibc-back-compat (Warren Togami)
d5aab70 build: add an option for enabling glibc back-compat (Cory Fields)
ffc6b67 build: add glibc/libstdc++ back-compat stubs (Cory Fields)
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.