-
Notifications
You must be signed in to change notification settings - Fork 37.8k
Enable binary releases to work on older Linux distros without a static link #4042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I like this approach much better. Fully statically linking glibc is usually a bad idea, I've spent many an hour tracking down mysterious name resolution failures from static libc. |
This should also benefit from ASLR, which was not possible with static linked glibc. |
@@ -0,0 +1,19 @@ | |||
#include "bitcoin-config.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to move these source files to src/compat
. Let's keep them separate from the main bitcoin source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, makes sense.
Static may be a bad idea, but it is the only way to be consistently compatible with a lot of old releases without extra maintenance overhead. I feel scared importing parts of the C and C++ libraries into bitcoin. Any change in the code may make it necessary to extend glibcxx_compat.cpp / src/glibc_compat.cpp with 'one more function'. If there are implementation problems in those files it will be hard to debug. Then again, it may be lesser of evils. Let's at least get this tested on all the platforms people were concerned about in the other topic:
|
Agreed on the overhead, but I do think that it's the best of all evils. We'd basically just have to keep an eye on it, and update as necessary. As for debugging though, as @gmaxwell pointed out, this makes it substantially easier, not harder. Anyway, release decisions should be locked in far earlier than they have been in the past. With a sane procedure, we'd know how we stand on final back-compat when shipping the first alpha. |
Half related, but more urgent if the gitian-shipped and self-built executables further diverge in implementation: we should make gitian build external debug information files (not to be shipped, just for our own use). This makes it possible to debug crashes in the gitian builds as well as to resolve crash dumps that people send. Working on test builds... |
That's one of the benefits of being deterministic :) 'objcopy --only-keep-debug' on the pre-stripped binaries should be enough to track things down in the future. |
Test builds available here: If you help testing, please let us know what you tested, and on what OS version and architecture. |
glibc/libstdc++ have added new symbols in later releases. When running a new binary against an older glibc, the run-time linker is unable to resolve the new symbols and the binary refuses to run. This can be fixed by adding our own versions of those functions, so that the build-time linker does not emit undefined symbols for them. This enables our binary releases to work on older Linux distros, while not incurring the downsides of a fully static binary.
moved to compat/ and force-pushed, since nothing else changed. |
Using "./configure --enable-glibc-back-compat" will attempt to be compatible with a target running glibc abi 2.9 and libstdc++ abi 3.4.
@laanwj bins run fine on fully updated CentOS 6.5 64-bit. Currently offshore so cant test 32bit on my box as of yet. |
Do any of the functions reimplemented by this patch actually get called from within Bitcoin? In particular, from within consensus-critical code? Is there any risk of this code not being exactly compatible with the libc/libstdc++ version the rest of the code is linked with? What if some of the implementation assumptions in std::list change? |
That would be useful to investigate.
A significant change in assumptions would break the ABI. After all, the data structures in subsequent versions need to be binary-compatible: a large part of the C++ library consists of templates and headers which get included into the code. |
To be clear: I like this solution a lot, but I want to understand the risks. |
|
||
const char* bad_exception::what() const throw() | ||
{ | ||
return "std::bad_exception"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That file and the above one don't seem to use our 4 spaces indentation rule.
@sipa obviously memcpy does. But I think that one is pretty self-explanatory. As for the c++ ones, they have a specific purpose, just that the definitions were moved from the header to the binary implementation. I think it's safe to say that the purpose of those functions won't change. The only danger I can imagine is that they could (for example) change the side-effects of an internal function that we call. For example, imagine that we need to implement list::insert. We call some of their internal methods like operator[] and iterator++, then we change some internal variables. It's possible that in the future, they could change those variables in the internal functions, so that we've actually changed them twice. But I don't think that's a very reasonable concern. If you look at what's actually implemented, you'll see that it's so low-level that there's really only one way to handle the function. Anyway, let's enumerate here so we can narrow down the concern: _fdelt_warn: Checks for overflows in FD functions. This can be hit (read: this forces a crash) on:
I initially had a unit-test for this, but the problem is that we actually want it to crash, and boost tests don't play nice with the idea of a program abort being a passing test. Plus, it spews a really nasty backtrace. bad_alloc/bad_cast/bad_exception: _List_node_base: ostream/istream templates: These just force on new templates that are declared 'extern template' in the headers. Seems strange that those worked without c++11 anway. gnu extension, i guess? out_of_range: This one I'll admit I don't know much about. If there's any interest in seeing what pulls in a symbol, you can trace it at link-time using -Wl,--trace-symbol. Also, as to the concern about accidentally introducing new symbols, it would be very easy to teach pull-tester to check for that. Though, obviously, it'd need to be using the same libc/libstdc++ as the release compiler. |
Also worth noting that we may be able to drop some of these (the what()s look like good candidates) with smarter linking. Namely -ffunctions-sections + -Wl,--gc-sections, or by enabling lto. Unfortunately, these would need to be used for all dependencies as well since we link statically, so they wouldn't be trivial to pull off. |
@Diapolo old habit, will fix. |
One last point. The libstdc++ stubs shouldn't be called in older distros anyway. The fact that those symbols aren't present means that their libs don't expect those functions to exist.. they're routed elsewhere. They just need to be present to satisfy the runtime linker. Point being that the testing of these needs to happen on newer distros, rather than old. For old distros, pass/fail should pretty much cover it. |
Err.. above, out-of-range is simple enough, I meant that I hadn't looked deeply into ctype::_M_widen_init. |
(last spam here, I'm just trying to hit all of this at once) See here for the _M_widen_init move: http://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=2dbdf201fad2ea4c13a4826e515dc76cb66f84dd libstdc++ even implements it several times (eww). So the abi is pretty much locked in. |
Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/05c20a553a12d03b1512a75973674c6d25534259 for binaries and test log. |
Compiling everything with -lto would be interesting. As we compile everything including the dependencies from source I expect that there would be quite some savings. On the other hand, the compile time and memory usage juggling all those symbols may make this infeasible. But it's good to keep in mind.
I don't understand this. If our executable uses libstdc++ functions that are not provided by the older libstdc++'s, I would suppose it will still use them on older distros. 'Their' libs don't expect the symbols to exist, but 'our' libs do. Or am I overlooking something? |
@laanwj I had a long explanation written up before I realized that you were right and I was wrong :) You're overlooking how the function is called, but in these cases, it doesn't matter. For example, _M_unhook() is called by _M_erase(). If _M_erase were present in the libstdc++ shared lib, it would call whichever unhook implementation was hard-coded into it. But, as far as I can see, all of our replaced functions are used by inlined callers, so the effect is that it always will use our stubs. You do seem to have a misconception about how our code uses these stubs, though. Keep in mind that we don't include definitions for these functions. "Our" code never sees them or has any idea that they exist. We just ends up getting routed to them from inside of libstdc++ at runtime. |
@laanwj can this also be tested for debian 6 squeeze? this is what Tails is based on, which is the basis of my interest in this issue, since I want people to easily use Bitcoin Core with Tor. Edit: Here's the results of running the 32-bit binaries in the latest version of tails (0.23): Full output here: |
@laanwj I'll reserve judgement on the Qt static link since I haven't looked into it before for Linux. I'll take your word for it, but I'm curious so I'll research a bit. Qt (qt5 at least) dyloads GL and friends via plugins, specifically to avoid compile-time dependencies. It may be worth looking into using qt5 for that reason. Those aren't missing glibc symbols, they're missing qt symbols. The build-side qt is newer than Squeeze's, so it's missing some new functionality. To be more specific, Precise links against qt 4.8, Squeeze uses 4.6. |
Last time I tried it was a mess, but you're welcome to try. |
If this is the case, then this would not be a regression from the lucid-based gitian builds of Bitcoin 0.8.x. We should not attempt to support things that old. Isn't Debian stable now 7 or Wheezy? |
@wtogami Debian stable is currently 7/Wheezy and qt is 4.8. The strangest build requirement for bitcoin on Debian 7 is that we have to forward port/pull db4.8 from Debian 6/Squeeze/oldstable as db5.1 is the default on 7. |
@quel This statement of fact has nothing to do with the gitian binary. |
@wtogami I will test the gitian binary on Debian 6 and 7 if I can ever pull it. I'm averaging 15KB/s. |
@wtogami https://download.visucore.com/bitcoin/bitcoin-f17e2c7-linux.tar.gz with gpg signature verified on Deiban 7.4, x86_64, running bin/64 for bitcoind, bitcoin-cli, and bitcoin-qt run for me without errors. That tar.gz appears to be dated 2014-04-10 08:22:55. Additionally, on a 64-bit platform running the 32-bit binaries works without issue as well. Besides click testing and basic cli testing, it is in no way exhaustive. I am happy to attach ldds, logs, or test additional builds. |
@theuni Anything else in particular do you think needs testing before merge? |
@wtogami For the most part, I think these are pretty much all or nothing. If bitcoind starts up successfully and runs for a few seconds, I don't believe any problems would be likely to surface afterward. If anyone is uncomfortable with that response, I can look into adding some tests, but they're low level enough that the tests would be something like:
Point being that if that failed, it'd be pretty obvious even without the test. Essentially, we just need to decide if this is the favored approach, and if so, set some sort of policy to deal with changes in the future. |
@theuni It would be nice to have at least tests that (indirectly) make use of the more complex functions in the compat/ files, like those in list_node and ctype::_M_widen_init(). As for policy, we want to stay compatible with:
Right? Let's write these things down and put them in some document under doc/... |
@laanwj See my comment above about the low-level-ness of these functions. The example above wasn't abstract, that would be the actual test-case for list_node. As for _M_widen_init(), this is all it takes is this to bring it in: #include <iostream>
int main()
{
for(int i=0; i != 2; i++)
std::cout << std::endl;
} You can verify that by doing a quick compile and grep: cory@cory-i7:~/dev/bitcoin/src(libc-compat)$ g++ test.cpp -O1 -o test && objdump -TC test | grep _M_widen_init
0000000000000000 DF *UND* 0000000000000000 GLIBCXX_3.4.11 std::ctype<char>::_M_widen_init() const So cout (among many other things) would be broken with a borked version of that function. I'm not sure there's really anything to test for that wouldn't be 100% obvious in the resulting binary. I'm not trying to be difficult on this, I hope it doesn't seem that way. But since we've lifted the definitions of these from upstream, it really does seem (to me, anyway) to be all or nothing. Any suggestions for alternate tests, or alternative ways of testing? |
Low level is fine. If no difficult contortions are needed to bring the symbols in and test them, then that's all the better. Let's add those two examples that you give as sanity tests (maybe use std::iostream instead of std::cout so that no standard output is generated). |
Lucid is glibc-2.11 |
I wonder whether we want something like a start-up test at runtime, rather than unit tests (which wouldn't catch being loaded with an incompatible dynamic library). It could also do something like a basic (and very short) memory test, or computing some EC identity to check verification correctness. On the other hand, systems that are sufficiently broken to be detected using a subsecond test likely wouldn't even boot... @theuni How much work/code do you think supporting 2.11 or 2.12 would be extra? Seems it would be very valuable if this can be extended, but of course... extra work and extra code to maintained. |
@sipa: This is compatible back to glibc 2.9 and libstdc++ 3.4. Qt is a different story though, I'm afraid, see the discussion above. I agree that a startup check would be better. I still can't really wrap my head around what a unit-test would actually be able to verify. Basically every operation at startup would depend on these in some way, so I agree that it likely wouldn't get as far as running anyway. But it couldn't hurt to do a few quick A==A checks explicitly before anything else, rather than possibly racing with sensitive code. |
@sipa Good idea on a start-up sanity check. @theuni Ok let's make the policy then:
We could even add a script that automatically checks the post-gitian executables if 'forbidden' symbols have been used, hence the compat/ wrappers needs updating. I don't care about Qt there but only bitcoind/bitcoin-cli. The older stable distributions are almost exclusively used on headless servers. |
@sipa ACK on the startup built in self-tests, the fact that e.g. on current fedora the fact that secp256k1 isn't supported in openssl is only 'detected' by a failed assertion deep inside key.cpp when other things run is somewhat concerning to me... but doesn't need to be part of this pull. |
+1 to doing sanity checks as the next PR. This current PR is ready for merge. ACK |
This reverts #3914 in favor of a more subtle approach, giving us the benefits of shared base libs and the (relative) portability of full-static.
glibc/libstd++ stubs have been added so that older distros don't see undefined symbols. They are bundled together into an --enable-glibc-back-compat configure switch, as I'm not sure there's any real-world usefulness in separating them.