-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Fix a violation of C++ standard rules where unions are used for type-punning #18167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Great first-time contribution! Welcome as a contributor! Hope to see more contributions from you. Thanks for tackling UB issues in the project. The bulk of them should be fixed by now and this is one of the last few UB issues I'm aware of. Don't hesitate to report and/or fix any other UB issues you might find and don't hesitate to ping me if you want your work reviewed. Concept ACK FWIW:
|
ACK be94096 Verified that Click for results
…
|
~0 Seems like the better solution here is to stop assuming floats have a specific internal representation? :/ |
@luke-jr That would indeed be preferable, but it seems that would effectively require implementing a IEEE 754 software encoder/decoder, which is nontrivial. This is still an improvement though. memcpy into another type seems indeed to be the modern idiom (we use the same in ReadLE32 and friends, after verifying that compilers indeed optimize through that). Concept ACK |
tmp.x = x; | ||
return tmp.y; | ||
uint64_t tmp; | ||
std::memcpy(&tmp, &x, sizeof(x)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add static_assert(sizeof(tmp) == sizeof(x), "double and uint64_t assumed to have the same size");
?
(this might be redundant with other places but I think it's also helpful as documentation)
(Same for the rest of the functions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I swear I wanted to do that, but then I was afraid people would find it paranoid 🤐
You sure I should go for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly stating assumptions is good and static_assert
:s cannot hurt :) Go for it! :)
FWIW:
bitcoin/src/compat/assumptions.h
Lines 43 to 46 in 68e841e
// Assumption: We assume floating-point widths. | |
// Example(s): Type punning in serialization code (ser_{float,double}_to_uint{32,64}). | |
static_assert(sizeof(float) == 4, "32-bit float assumed"); | |
static_assert(sizeof(double) == 8, "64-bit double assumed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There's a famous library, called softfloat. It does all that. But I think that's unnecessary. IEEE 754 is the same everywhere as repsentation. The differences between different CPUs come from certain operations, like rounding and division by epsilon (IIRC). You can see in softfloat different rounding modes and similar things. |
ACK 0653939 |
1 similar comment
ACK 0653939 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 0653939
Code review ACK 0653939 |
Please, don't. If you're really willing to pick up a huge amount of work to support obscure platforms, I would prefer to move away from serializing floating point numbers at all.I tried this once but there's quite some usage (luckily, not in the P2P or consensus, only for internal file formats, so it's not impossible at least!). |
Great to see this merged! @TheQuantumPhysicist If you want to tackle the few remaining instances of UB you might want to build with |
@practicalswift Thanks for the support! Actually I use clang sanitizers all the time. Thread sanitizer, undefined behavior sanitizer, memory sanitizer and address sanitizer. You can see them in all my github projects that I wrote myself. But this one here I discovered by coincidence and hence fixed 😉 And I almost never use gcc anymore 😄 Btw, if you're interested, there's an issue I opened in libbtc about undefined behavior like 2 years ago... it's still untouched and not merged. |
Checked that be94096 compiles to the same bitcoind with
|
…ns are used for type-punning 0653939 Add static_asserts to ser_X_to_Y() methods (Samer Afach) be94096 Fix a violation of C++ standard rules that unions cannot be switched. (Samer Afach) Pull request description: Type punning in C++ is not like C. As per the C++ standard, one cannot use unions to convert the bit type. A discussion about this can be found [here](https://stackoverflow.com/questions/25664848/unions-and-type-punning). In C++, a union is supposed to only hold one type at a time. It's intended to be used only as `std::variant`. Switching types is undefined behavior. In fact, C++20 has a special casting function, called [`bit_cast`](https://en.cppreference.com/w/cpp/numeric/bit_cast) that solved this problem. Why has it been working so far? Because some compilers tolerate using unions and switching types, like gcc. More information [here](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Type-punning). One important thing to mention is that performance is generally not affected by that memcpy. Compilers are smart enough to convert that to a memory cast when possible. But we have to do it the right way, otherwise, it's jut undefined behavior that depends on the compiler. ACKs for top commit: practicalswift: ACK 0653939 elichai: ACK 0653939 laanwj: Code review ACK 0653939 kristapsk: ACK 0653939 Tree-SHA512: f6e89de39fc964750429139bab6b5a1346f7060334b7afa020e315bdad8f8c195bce2b8a9e343f06e7fff175e2dfb1cdabfcb6fe405bea0febe4962f0cc62557
…ns are used for type-punning 0653939 Add static_asserts to ser_X_to_Y() methods (Samer Afach) be94096 Fix a violation of C++ standard rules that unions cannot be switched. (Samer Afach) Pull request description: Type punning in C++ is not like C. As per the C++ standard, one cannot use unions to convert the bit type. A discussion about this can be found [here](https://stackoverflow.com/questions/25664848/unions-and-type-punning). In C++, a union is supposed to only hold one type at a time. It's intended to be used only as `std::variant`. Switching types is undefined behavior. In fact, C++20 has a special casting function, called [`bit_cast`](https://en.cppreference.com/w/cpp/numeric/bit_cast) that solved this problem. Why has it been working so far? Because some compilers tolerate using unions and switching types, like gcc. More information [here](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Type-punning). One important thing to mention is that performance is generally not affected by that memcpy. Compilers are smart enough to convert that to a memory cast when possible. But we have to do it the right way, otherwise, it's jut undefined behavior that depends on the compiler. ACKs for top commit: practicalswift: ACK 0653939 elichai: ACK 0653939 laanwj: Code review ACK 0653939 kristapsk: ACK 0653939 Tree-SHA512: f6e89de39fc964750429139bab6b5a1346f7060334b7afa020e315bdad8f8c195bce2b8a9e343f06e7fff175e2dfb1cdabfcb6fe405bea0febe4962f0cc62557
Summary: > Type punning in C++ is not like C. As per the C++ standard, one cannot use unions to convert the bit type. A discussion about this can be found [[https://stackoverflow.com/questions/25664848/unions-and-type-punning|here]]. In C++, a union is supposed to only hold one type at a time. It's intended to be used only as std::variant. Switching types is undefined behavior. > > In fact, C++20 has a special casting function, called bit_cast that solved this problem. > > Why has it been working so far? Because some compilers tolerate using unions and switching types, like gcc. > > One important thing to mention is that performance is generally not affected by that memcpy. Compilers are smart enough to convert that to a memory cast when possible. But we have to do it the right way, otherwise, it's jut undefined behavior that depends on the compiler. Note that @practicalswift verified that the bytecode generated by `clang++ -O1` is identical before and after the change: bitcoin/bitcoin#18167 (comment) This is a backport of Core [[bitcoin/bitcoin#18167 | PR18167]] Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, deadalnix Reviewed By: #bitcoin_abc, deadalnix Differential Revision: https://reviews.bitcoinabc.org/D8768
bd4b846 Remove old serialization primitives (Pieter Wuille) 060d62b Convert the last, non-trivial, serialization functions to the new form (furszy) 8c74c09 Convert LimitedString to formatter (Pieter Wuille) f021897 Fix CDiskBlockIndex serialization of dummy fields for old DB versions (random-zebra) 1ee0cb2 Convert CDiskBlockIndex to new serialization. (furszy) 221bf49 Convert wallet to new serialization (furszy) cf06950 Convert to new serialization (step 3) (furszy) dc0fc95 Remove old MESS_VER_STRMESS message version try-catch. (furszy) 35fca11 Convert Qt to new serialization (Pieter Wuille) 3f7826e Add comments to CustomUintFormatter (Pieter Wuille) eccd473 Convert to new serialization (step 2). Focused on object's serializations that doesn't require an special treatment. (furszy) 0f15784 Convert everything except wallet/qt to new serialization (step 1) (Pieter Wuille) 3d3ee64 Convert merkleblock to new serialization (Pieter Wuille) 13577fb Add SER_READ and SER_WRITE for read/write-dependent statements (Russell Yanofsky) 7344c1a Extend CustomUintFormatter to support enums (Russell Yanofsky) c4d6228 Merge BigEndian functionality into CustomUintFormatter (Pieter Wuille) 3765d6c Add static_asserts to ser_X_to_Y() methods (Samer Afach) 806213a Fix a violation of C++ standard rules that unions cannot be switched. (Samer Afach) d6380c4 Add CustomUintFormatter (Pieter Wuille) fd29a50 Make VectorFormatter support stateful formatters (Russell Yanofsky) 4e2afad Convert CCompactSize to proper formatter (Pieter Wuille) bb99030 Get rid of VARINT default argument (Pieter Wuille) e107a0c Convert undo.h to new serialization framework (Pieter Wuille) a926ba3 Make std::vector and prevector reuse the VectorFormatter logic (Pieter Wuille) 1dfddce Add custom vector-element formatter (Pieter Wuille) df4e1ba Add a constant for the maximum vector allocation (5 Mbyte) (Pieter Wuille) c2fdeaf Convert compression.h to new serialization framework (Pieter Wuille) aa35991 Add FORMATTER_METHODS, similar to SERIALIZE_METHODS, but for formatters (Pieter Wuille) 3e38199 Move compressor utility functions out of class (Pieter Wuille) 7376a95 Convert chain to new serialization (Pieter Wuille) bbfc55c Convert VARINT to the formatter/Using approach (Pieter Wuille) 39c58a1 Add a generic approach for (de)serialization of objects using code in other classes (Pieter Wuille) ace3895 Convert addrdb/addrman to new serialization (Pieter Wuille) 6bb135e Introduce new serialization macros without casts (Pieter Wuille) ace7d14 Drop minor GetSerializeSize template (Ben Woosley) f05e692 Drop unused GetType() from CSizeComputer (furszy) 5c36b3d Introduce BigEndian wrapper and use it for netaddress ports (Pieter Wuille) fb3c646 Migrate last FLATDATA calls to use Span. (furszy) 1ef2d90 Support serializing Span<unsigned char> and use that instead of FLATDATA (Pieter Wuille) 8fef544 Add Slice: a (pointer, size) array view that acts like a container (Pieter Wuille) Pull request description: Decoupled from #2411, built on top of #2359. Focused on creating the Span class and updating the serialization framework and every object using it up to latest upstream structure (3-4 years ahead of what we currently are in master). We will be up-to-date with them in the area after finishing with #2411 entirely (there are few more updates to the serialization code that comes down #2411 commits line that cannot cherry-pick here). Adapted the following PRs: * bitcoin#12886. * bitcoin#12916. * bitcoin#13558. * bitcoin#17850. * bitcoin#17896. * bitcoin#12752. * bitcoin#17957. * bitcoin#18021. * bitcoin#18087. * bitcoin#18112 (only from 353f376 that we don't support). * bitcoin#18167. * bitcoin#18317. * bitcoin#19032. ACKs for top commit: random-zebra: ACK bd4b846 Fuzzbawls: ACK bd4b846 Tree-SHA512: fe1b31d0976dff97bfeed0f9efeeb4c6c161277529880ede990c9b3fb0ea680f25b4be01b739f6bf7eeca79fa7687c9c2146c403c96e86bc6b052c6dd88e6930
ecde04a [Consensus] Bump Active Protocol version to 70923 for v5.3 (random-zebra) b63e4f5 Consensus: Add v5.3 enforcement height for testnet. (furszy) f44be94 Only relay IPv4, IPv6, Tor addresses (Pieter Wuille) 015298c fix: tor: Call event_base_loopbreak from the event's callback (furszy) 34ff7a8 Consensus: Add mnb ADDRv2 guard. (furszy) b4515dc GUI: Present v3 onion addresses properly in MNs list. (furszy) 337d43d tests: don't export in6addr_loopback (Vasil Dimov) 2cde8e0 GUI: Do not show the tor v3 onion address in the topbar. (furszy) 0b5f406 Doc: update tor.md with latest upstream information. (furszy) 89df7f2 addrman: ensure old versions don't parse peers.dat (Vasil Dimov) bb90c5c test: add getnetworkinfo network name regression tests (Jon Atack) d8e01b5 rpc: update GetNetworksInfo() to not return unsupported networks (Jon Atack) 57fc7b0 net: update GetNetworkName() with all enum Network cases (Jon Atack) 647d60b tests: Modify rpc_bind to conform to bitcoin#14532 behaviour. (Carl Dong) d4d6729 Allow running rpc_bind.py --nonloopback test without IPv6 (Kristaps Kaupe) 4a034d8 test: Add rpc_bind test to default-run tests (Wladimir J. van der Laan) 61a08af [tests] bind functional test nodes to 127.0.0.1 Prevents OSX firewall (Sjors Provoost) 6a4f1e0 test: Add basic addr relay test (furszy) 78aa61c net: Make addr relay mockable (furszy) ba954ca Send and require SENDADDRV2 before VERACK (Pieter Wuille) 61c2ed4 Bump net protocol version + don't send 'sendaddrv2' to pre-70923 software (furszy) ccd508a tor: make a TORv3 hidden service instead of TORv2 (Vasil Dimov) 6da9a14 net: advertise support for ADDRv2 via new message (furszy) e58d5d0 Migrate to test_large_inv() to Misbehaving logging. (furszy) d496b64 [QA] fix mininode CAddress ser/deser (Jonas Schnelli) cec9567 net: CAddress & CAddrMan: (un)serialize as ADDRv2 Change the serialization of `CAddrMan` to serialize its addresses in ADDRv2/BIP155 format by default. Introduce a new `CAddrMan` format version (3). (furszy) b8c1dda streams update: get rid of nType and nVersion. (furszy) 3eaa273 Support bypassing range check in ReadCompactSize (Pieter Wuille) a237ba4 net: recognize TORv3/I2P/CJDNS networks (Vasil Dimov) 8e50853 util: make EncodeBase32 consume Spans (Sebastian Falbesoner) 1f67e30 net: CNetAddr: add support to (un)serialize as ADDRv2 (Vasil Dimov) 2455420 test: move HasReason so it can be reused (furszy) d41adb4 util: move HasPrefix() so it can be reused (Vasil Dimov) f6f86af Unroll Keccak-f implementation (Pieter Wuille) 45222e6 Implement keccak-f[1600] and SHA3-256 (Pieter Wuille) 08ad06d net: change CNetAddr::ip to have flexible size (furszy) 3337219 net: improve encapsulation of CNetAddr. (furszy) 910d5c4 test: Do not instantiate CAddrDB for static call (Hennadii Stepanov) 6b607ef Drop IsLimited in favor of IsReachable (Ben Woosley) a40711b IsReachable is the inverse of IsLimited (DRY). Includes unit tests (marcaiaf) 8839828 net: don't accept non-left-contiguous netmasks (Vasil Dimov) 5d7f864 rpcbind: Warn about exposing RPC to untrusted networks (Luke Dashjr) 2a6abd8 CNetAddr: Add IsBindAny method to check for INADDR_ANY (Luke Dashjr) 4fdfa45 net: Always default rpcbind to localhost, never "all interfaces" (Luke Dashjr) 31064a8 net: Minor accumulated cleanups (furszy) 9f9c871 tests: Avoid using C-style NUL-terminated strings as arguments (practicalswift) f6c52a3 tests: Add tests to make sure lookup methods fail on std::string parameters with embedded NUL characters (practicalswift) a751b9b net: Avoid using C-style NUL-terminated strings as arguments in the netbase interface (furszy) f30869d test: add IsRFC2544 tests (Mark Tyneway) ed5abe1 Net: Proper CService deserialization + GetIn6Addr return false if addr isn't an IPv6 addr (furszy) 86d73fb net: save the network type explicitly in CNetAddr (Vasil Dimov) ad57dfc net: document `enum Network` (Vasil Dimov) cb160de netaddress: Update CNetAddr for ORCHIDv2 (Carl Dong) c3c04e4 net: Better misbehaving logging (furszy) 3660487 net: Use C++11 member initialization in protocol (Marco) 082baa3 refactor: Drop unused CBufferedFile::Seek() (Hennadii Stepanov) e2d776a util: CBufferedFile fixes (Larry Ruane) 6921f42 streams: backport OverrideStream class (furszy) Pull request description: Conjunction of a large number of back ports, updates and refactorings that made with the final goal of implementing v3 Onion addresses support (BIP155 https://github.com/bitcoin/bips/blob/master/bip-0155.mediawiki) before the tor v2 addresses EOL, scheduled, by the Tor project, for (1) July 15th: v2 addr support removal from the code base, and (2) October 15th: v2 addr network disable, where **every peer in our network running under Tor will loose the connection and drop the network**. As BIP155 describes, this is introducing a new P2P message to gossip longer node addresses over the P2P network. This is required to support new-generation Onion addresses, I2P, and potentially other networks that have longer endpoint addresses than fit in the 128 bits of the current addr message. In order to achieve the end goal, had to: 1. Create Span class and push it up to latest Bitcoin implementation. 2. Update the whole serialization framework and every object using it up to latest Bitcoin implementation (3-4 years ahead of what we currently are in master). 3. Update the address manager implementing ASN-based bucketing of the network nodes. 4. Update and refactor the netAddress and address manager tests to latest Bitcoin implementation (4 years ahead of what we currently are in master). 5. Several util string, vector, encodings, parsing, hashing backports and more.. Important note: This PR it is not meant to be merged as a standalone PR, will decouple smaller ones moving on. Adding on each sub-PR its own description isolated from this big monster. Second note: This is still a **work-in-progress**, not ready for testing yet. I'm probably missing to mention few PRs that have already adapted to our sources. Just making it public so can decouple the changes, we can start merging them and i can continue working a bit more confortable (rebase a +170 commits separate branch is not fun..). ### List of back ported and adapted PRs: Span and Serialization: ---------------- * bitcoin#12886. * bitcoin#12916. * bitcoin#13558. * bitcoin#13697. (Only Span's commit 29943a9) * bitcoin#17850. * bitcoin#17896. * bitcoin#12752. * bitcoin#16577. * bitcoin#16670. (without faebf62) * bitcoin#17957. * bitcoin#18021. * bitcoin#18087. * bitcoin#18112 (only from 353f376 that we don't support). * bitcoin#18167. * bitcoin#18317. * bitcoin#18591 (only Span's commit 0fbde48) * bitcoin#18468. * bitcoin#19020. * bitcoin#19032. * bitcoin#19367. * bitcoin#19387. Net, NetAddress and AddrMan: ---------------- * bitcoin#7932. * bitcoin#10756. * bitcoin#10765. * bitcoin#12218. * bitcoin#12855. * bitcoin#13532. * bitcoin#13575. * bitcoin#13815. * bitcoin#14532. * bitcoin#15051. * bitcoin#15138. * bitcoin#15689. * bitcoin#16702. * bitcoin#17243. * bitcoin#17345. * bitcoin#17754. * bitcoin#17758. * bitcoin#17812. * bitcoin#18023. * bitcoin#18454. * bitcoin#18512. * bitcoin#19314. * bitcoin#19687 Keys and Addresses encoding: ---------------- * bitcoin#11372. * bitcoin#17511. * bitcoin#17721. Util: ---------------- * bitcoin#9140. * bitcoin#16577. * bitcoin#16889. * bitcoin#19593. Bench: ---------------- * bitcoin#16299. BIP155: ---------------- * bitcoin#19351. * bitcoin#19360. * bitcoin#19534. * bitcoin#19628. * bitcoin#19841. * bitcoin#19845. * bitcoin#19954. * bitcoin#19991 (pending). * bitcoin#19845. * bitcoin#20000 (pending). * bitcoin#20120. * bitcoin#20284. * bitcoin#20564. * bitcoin#21157 (pending). * bitcoin#21564 (pending). * Fully removed v2 onion addr support. * Add hardcoded seeds. * Add release-notes, changes to files.md and every needed documentation. I'm currently working on the PRs marked as "pending", this isn't over, but I'm pretty pretty close :). What a long road.. ACKs for top commit: random-zebra: utACK ecde04a Fuzzbawls: ACK ecde04a Tree-SHA512: 82c95fbda76fce63f96d8a9af7fa9a89cb1e1b302b7891e27118a6103af0be23606bf202c7332fa61908205e6b6351764e2ec23d753f1e2484028f57c2e8b51a
Type punning in C++ is not like C. As per the C++ standard, one cannot use unions to convert the bit type. A discussion about this can be found here. In C++, a union is supposed to only hold one type at a time. It's intended to be used only as
std::variant
. Switching types is undefined behavior.In fact, C++20 has a special casting function, called
bit_cast
that solved this problem.Why has it been working so far? Because some compilers tolerate using unions and switching types, like gcc. More information here.
One important thing to mention is that performance is generally not affected by that memcpy. Compilers are smart enough to convert that to a memory cast when possible. But we have to do it the right way, otherwise, it's jut undefined behavior that depends on the compiler.