Skip to content

Conversation

Adlai-Holler
Copy link

Fixes #22613

The fix is pretty simple – it's legal to memcpy between these types, you just can't dereference members after a reinterpret_cast. The memcpys will be optimised away in any half decent compiler.

I chose to switch GetSockAddr/SetSockAddr to accept sockaddr_storage rather than sockaddr so we can be sure we're never reading/writing out of bounds. Essentially, struct sockaddr* is old junk that we never want to deal with. Only hand those to the C APIs.

@Adlai-Holler Adlai-Holler changed the title Fix undefined behavior in socket address handling net: Fix undefined behavior in socket address handling Apr 11, 2022
@fanquake fanquake added the P2P label Apr 11, 2022
@Adlai-Holler
Copy link
Author

CI failure looks unrelated - a timeout occurred in fee_estimator test. Can we kick the CI again?

@kristapsk
Copy link
Contributor

Can we kick the CI again?

I think you can do it yourself, git commit --amend and then git push --force.

@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 11, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #25172 (refactor: use std:: prefix for std lib funcs by fanquake)
  • #24464 (logging: Add severity level to logs by klementtan)
  • #23531 (Add Yggdrasil support by prusnak)
  • #21878 (Make all networking code mockable by vasild)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@laanwj
Copy link
Member

laanwj commented Apr 13, 2022

Concept ACK, I agree this is UB and it should be fixed. memcpy is notoriously dangerous though, this will have to be carefully checked to not introduce any subtle bugs, especially to make sure the right lengths are used everywhere.

@Adlai-Holler
Copy link
Author

In terms of lengths, and for more general context on this issue so that reviewers can avoid duplicating work:

  • struct sockaddr is an old 1980's generic socket address type, (made with IPv4 in mind). It's like a base class.
  • struct sockaddr_in ("internet") is the IPv4 "subclass."
    • It has a size that fits within sockaddr and its layout is such that the two can be cast back & forth. In C. In C++ it's UB to manipulate aliased unrelated types like that.
  • struct sockaddr_in6 is the IPv6 "subclass."
    • It is larger than sockaddr
    • But the API still works with sockaddr*. It's up to you and the API to cast the pointer to the appropriate "subclass" once you know which protocol you're dealing with.
  • struct sockaddr_storage was added along with sockaddr_in6 as a "catch-all" opaque blob that is large enough to hold either one. The only field defined on the struct is the ss_family field (i.e. is this IPv4 or IPv6?). That field is guaranteed to map to the field in all the other versions so it's "sort of" castable.

So, in this PR here's the strategy:

  • On our side, if we need a socket address type that is protocol-independent, make it sockaddr_storage instead of sockaddr. Makes sure we have a big enough chunk of memory to work with, and it removes some internal casting.
  • For writing to out-variables of sockaddr_storage*, first we zero all the memory of the out-variable to get a clean slate, then create a protocol-specific local variable e.g. sockaddr_in and populate it, then memcpy the smaller subclass variable into the zero'd storage variable.
  • Replace downcasts of sockaddr* -> e.g. sockaddr_in* with creating a local variable of the appropriate type and memcpy'ing using the "subclass" length. This looks like a read out of bounds but it's no more out-of-bounds than casting the smaller struct pointer to the larger one was. It's the same thing really.

Copy link
Contributor

@jamesob jamesob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly-ACK 77f0208 (jamesob/ackr/24827.1.Adlai-Holler.net_fix_undefined_behavi)

Built, tested locally.

I'm still spotting some reinterpret_cast<struct sockaddr*> calls in e.g.

  • ConnectSocketDirectly
  • TorControlConnection::Connect
  • GetBindAddress
  • CConman::AcceptConnection
  • [some others in net.cpp...]
  • randomenv.cpp:AddSockaddr (not sure if this one matters much)

so I'm not sure this change resolves all of the UB related to sockaddr use.
Casting from sockaddr_storage* -> sockaddr* seems precarious, since they're
different sizes, but is there some reason you avoided touching those here as
well?

All memcpy invocations look safe to me. Obviously the code will require care
when modified though. I wonder if we could consolidate the memcpy calls to a
(templated?) function to cut down the risk.

Longterm it may make sense to move to some kind of polymorphic union that can
encompass both sockaddr_in and sockaddr_in6 and doesn't require all the
ip4/6 case handling, but that's speculative and in any case the avoidance of UB
seems like a strict improvement here.

Show signature data

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

ACK 77f0208cf912389c88415e8efc80af06ec7616e1 ([`jamesob/ackr/24827.1.Adlai-Holler.net_fix_undefined_behavi`](https://github.com/jamesob/bitcoin/tree/ackr/24827.1.Adlai-Holler.net_fix_undefined_behavi))

Built, tested locally. 

I'm still spotting some `reinterpret_cast<struct sockaddr*>` calls in e.g.

- - [ ] `ConnectSocketDirectly`
- - [ ] `TorControlConnection::Connect`
- - [ ] `GetBindAddress`
- - [ ] `CConman::AcceptConnection`
- - [ ] [some others in net.cpp...]
- - [ ] `randomenv.cpp:AddSockaddr` (not sure if this one matters much)

so I'm not sure this change resolves all of the UB related to sockaddr use.
Casting from `sockaddr_storage*` -> `sockaddr*` seems precarious, since they're
different sizes, but is there some reason you avoided touching those here as
well?

All memcpy invocations look safe to me. Obviously the code will require care
when modified though. I wonder if we could consolidate the memcpy calls to a 
(templated?) function to cut down the risk.

Longterm it may make sense to move to some kind of polymorphic union that can
encompass both `sockaddr_in` and `sockaddr_in6` and doesn't require all the
ip4/6 case handling, but that's speculative and in any case the avoidance of UB
seems like a strict improvement here.
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEGNRVI1NPYuZCSIrGepNdrbLETwUFAmJYlAUACgkQepNdrbLE
TwVGxg/+NwnQ9FbaWInxv54qFF+X4QWYiOA1JAzPpGBv/Rf0KMbSz/enSsAax0j6
YxwXyXXOMkL0CR//PzCMZqEBpqydOZcSnAE3elINlhySWMhpz0iXxz0SNuCyTnSp
fMl/5RKTvkPvWqYCgHCqM0HmUPwdkSEFi+BHT+B8+kZBZcNgegbXEPiRcQqG6JEJ
qs9Ro0dP9J3oH9khpxfucTkRerHSY3QmL7ZZIbzSazQnhmo/yUP1Q5zZHBsT3hZP
RcDzp0T6RX94/vhT2gSaGRKCYasUbiJ07puEj6q/lfVWwzCBewvR/PseMS3Q5GWF
Cj6WpTRiCetq6yeKDkGId4tgGqLEHk1RHpbtzOLbzoJMMR0sclbQjPEkHQy2/qXJ
Asrvh39zWhCZU83KMTxY5tzP66XGtxgXcCmtFjxBjwBa4YcgGezpKYylUVwje7uT
Hw2Z+MAZJTgjCyYIhdITs89fcJSbdoSuEekuPXtIgKAAvGmmFGFfGGoM60GonWVY
00PHDw8eKluGMe7cG1PvQI58jWAForZEu9rqfyspBZcbG/l+VVOoKK/EfYnmDcQ1
oFJppPouXvX/btgk5uRuNVCrzETlaxCcyRAekdKXh4IC6JmyV9pbS1f+brvOejLx
k+5uV4d8FzkXFQkoOMcrfukQzLq063XROF9oyN/jKww/ojSnSjE=
=kbVB
-----END PGP SIGNATURE-----

Show platform data

Tested on Linux-5.10.0-11-amd64-x86_64-with-glibc2.31

Configured with ./configure LDFLAGS=-L/home/james/src/bitcoin/db4/lib/ CPPFLAGS=-I/home/james/src/bitcoin/db4/include/ CXXFLAGS=-fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -Wsign-compare -Werror=thread-safety-analysis --enable-wallet --enable-debug --with-daemon --enable-natpmp-default

Compiled with /usr/bin/ccache /usr/local/bin/clang++ -std=c++17 -mavx -mavx2 -mpclmul -fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -Wsign-compare -Werror=thread-safety-analysis -O0 -g3 -ftrapv -fdebug-prefix-map=$(abs_top_srcdir)=.  -Wstack-protector -fstack-protector-all -fcf-protection=full -fstack-clash-protection -msse4.1 -msse4.2 -msse4 -msha  i

Compiler version: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63

@jamesob
Copy link
Contributor

jamesob commented Apr 14, 2022

By the way, welcome to the project and thanks for the great write-up!

@jamesob
Copy link
Contributor

jamesob commented Apr 14, 2022

Casting from sockaddr_storage* -> sockaddr* seems precarious, since they're different sizes, but is there some reason you avoided touching those here as well?

I guess since we're essentially "upcasting" in this case (i.e. the mem allocation referenced by sockaddr_storage is larger than sockaddr, so the cast pointer will never run into unclaimed memory) it's okay to do this cast. Is that right?

@Adlai-Holler
Copy link
Author

The remaining casts to sockaddr* are unavoidable and safe because we are calling into the external C APIs which expect us to do that cast.

Good call on some wrapper functions. I wrote them and I'm a lot happier with the result. Templating didn't work out because (1) in the load functions I ended up needing to check the protocol type for safety and (2) in general I didn't want to be memcpy'ing into templated types.

We'll see how the CI does but I'm really happy with the new state of the PR, think we're in good shape.

@Adlai-Holler
Copy link
Author

Ping – I know reviewers are busy but it seems like we're really close! 🙏 I just swapped the direct memcpy's out for function wrappers so it's more of a drop-in replacement.

Copy link
Contributor

@jamesob jamesob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, nice changes. I think the new functions you've introduced should help to reduce @laanwj's concerns about memset. We could play some code-golf with the patch below to cut down on the number of separate memsets, but I think what you have right now in HEAD is a big improvement on the status quo with or without the patch.

diff --git a/src/netaddress.cpp b/src/netaddress.cpp
index 2569e9e71f..11929f5394 100644
--- a/src/netaddress.cpp
+++ b/src/netaddress.cpp
@@ -1241,52 +1241,34 @@ bool operator<(const CSubNet& a, const CSubNet& b)
     return (a.network < b.network || (a.network == b.network && memcmp(a.netmask, b.netmask, 16) < 0));
 }
 
-sockaddr_in LoadSockaddrIPv4(const sockaddr& src) {
-    assert(src.sa_family == AF_INET);
+template <typename T>
+static sockaddr_in LoadSockaddrIPv4Internal(const T& src) {
     sockaddr_in s4;
     std::memcpy(&s4, &src, sizeof(s4));
     return s4;
 }
+
+sockaddr_in LoadSockaddrIPv4(const sockaddr& src) {
+    assert(src.sa_family == AF_INET);
+    return LoadSockaddrIPv4Internal(src);
+}
 sockaddr_in LoadSockaddrIPv4(const sockaddr_storage& src) {
     assert(src.ss_family == AF_INET);
-    sockaddr_in s4;
-    std::memcpy(&s4, &src, sizeof(s4));
-    return s4;
+    return LoadSockaddrIPv4Internal(src);
 }
 
-sockaddr_in6 LoadSockaddrIPv6(const sockaddr& src) {
-    assert(src.sa_family == AF_INET6);
-    sockaddr_in6 s6;
-    std::memcpy(&s6, &src, sizeof(s6));
-    return s6;
-}
-sockaddr_in6 LoadSockaddrIPv6(const sockaddr_storage& src) {
-    assert(src.ss_family == AF_INET6);
+template <typename T>
+sockaddr_in6 LoadSockaddrIPv6Internal(const T& src) {
     sockaddr_in6 s6;
     std::memcpy(&s6, &src, sizeof(s6));
     return s6;
 }
 
-void StoreSockaddrIPv4(const sockaddr_in& src, sockaddr* ptr, socklen_t *addrlen) {
-    constexpr size_t sz4 = sizeof(sockaddr_in);
-    assert(*addrlen >= sz4);
-    *addrlen = sz4;
-    std::memcpy(ptr, &src, sz4);
-}
-void StoreSockaddrIPv4(const sockaddr_in& src, sockaddr_storage* ptr, socklen_t *addrlen) {
-    constexpr size_t sz4 = sizeof(sockaddr_in);
-    *addrlen = sz4;
-    std::memcpy(ptr, &src, sz4);
-}
-
-void StoreSockaddrIPv6(const sockaddr_in6& src, sockaddr* ptr, socklen_t *addrlen) {
-    constexpr size_t sz6 = sizeof(sockaddr_in6);
-    assert(*addrlen >= sz6);
-    *addrlen = sz6;
-    std::memcpy(ptr, &src, sz6);
+sockaddr_in6 LoadSockaddrIPv6(const sockaddr& src) {
+    assert(src.sa_family == AF_INET6);
+    return LoadSockaddrIPv6Internal(src);
 }
-void StoreSockaddrIPv6(const sockaddr_in6& src, sockaddr_storage* ptr, socklen_t *addrlen) {
-    constexpr size_t sz6 = sizeof(sockaddr_in6);
-    *addrlen = sz6;
-    std::memcpy(ptr, &src, sz6);
+sockaddr_in6 LoadSockaddrIPv6(const sockaddr_storage& src) {
+    assert(src.ss_family == AF_INET6);
+    return LoadSockaddrIPv6Internal(src);
 }
diff --git a/src/netaddress.h b/src/netaddress.h
index 171653bdbc..254c92e86f 100644
--- a/src/netaddress.h
+++ b/src/netaddress.h
@@ -596,9 +596,25 @@ sockaddr_in6 LoadSockaddrIPv6(const sockaddr_storage& src);
  * Overwrite the provided generic sockaddr struct safely. Use these
  * instead of casting the src and dereferencing it.
  */
-void StoreSockaddrIPv4(const sockaddr_in& src, sockaddr* ptr, socklen_t *addrlen);
-void StoreSockaddrIPv4(const sockaddr_in& src, sockaddr_storage* ptr, socklen_t *addrlen);
-void StoreSockaddrIPv6(const sockaddr_in6& src, sockaddr* ptr, socklen_t *addrlen);
-void StoreSockaddrIPv6(const sockaddr_in6& src, sockaddr_storage* ptr, socklen_t *addrlen);
+template <typename T>
+void StoreSockaddrIPv4(const sockaddr_in& src, T* ptr, socklen_t *addrlen) {
+    constexpr size_t sz4 = sizeof(sockaddr_in);
+
+    if (typeid(T) == typeid(sockaddr)) {
+        assert(*addrlen >= sz4);
+    }
+    *addrlen = sz4;
+    std::memcpy(ptr, &src, sz4);
+}
+
+template <typename T>
+void StoreSockaddrIPv6(const sockaddr_in6& src, T* ptr, socklen_t *addrlen) {
+    constexpr size_t sz6 = sizeof(sockaddr_in6);
+    if (typeid(T) == typeid(sockaddr)) {
+        assert(*addrlen >= sz6);
+    }
+    *addrlen = sz6;
+    std::memcpy(ptr, &src, sz6);
+}
 
 #endif // BITCOIN_NETADDRESS_H

Fixes bitcoin#22613

The fix is pretty simple – it's legal to memcpy between these types, you just can't reference members after a reinterpret_cast. The memcpys will be optimised away in any half decent compiler.

I chose to switch GetSockAddr/SetSockAddr to work with sockaddr_storage so we can be sure we're never reading/writing out of bounds. Essentially, `struct sockaddr*` is old junk that we never want to deal with. Only hand those to the C APIs.
@Adlai-Holler
Copy link
Author

Good call @jamesob! I applied your patch and made a few readability changes to keep the diff minimal.

Copy link
Contributor

@jamesob jamesob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 0b3ad72 (jamesob/ackr/24827.3.Adlai-Holler.net_fix_undefined_behavi)

Reviewed, ran functional tests locally. Seems like a clear win to avoid UB, and probably improves readability too.

Show signature data

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

ACK 0b3ad721462ed060fae991d8ad1510cd92a69bfb ([`jamesob/ackr/24827.3.Adlai-Holler.net_fix_undefined_behavi`](https://github.com/jamesob/bitcoin/tree/ackr/24827.3.Adlai-Holler.net_fix_undefined_behavi))

Reviewed, ran functional tests locally. Seems like a clear win to avoid UB, and probably readability too.
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEGNRVI1NPYuZCSIrGepNdrbLETwUFAmKM4SgACgkQepNdrbLE
TwUcqw/7BsjxUtnuJOQm3Lrev+KaKWgPVkPEgCammTzG9j+DwHiRnByEGxIKnWHF
GwhqiK3asXhokUH6wtmEyCdoAFzad3IYkbMOGIL6vX+2RR3mAPlP6arCcZ4goHz6
cP0m/IIHhhoMVL2wT72puQgWXH5bQJPxRxq/W6GWIs+xHt5BJCPmCg9EAPfchWNu
iJMy1nJGHrktKAMQXNAxhMUy33rhyfang1s9LgGmOJt3NV1QGW5rCope0YgyWyQX
ZZTAJpp1F56SYfldDbSwxoZ7HyIsn0GBxQe+kUIyA8P37QKWx8seQo4mF7MT7c+g
itZePca2ztFVg7PXRqyAeV2qCBIDXJFlCTWFWKP+0Qv9daRMHuuLqywM6/njinJN
tQ1Dr6zaeCytvO+B3TSF98uq/GnGdzBSpEWzywGYEiL8xI4O9s/63QASUr/Fs3tE
xwwenHpASN0mCBkndmnBs5Ec3+rE5RYRdxh2XPhH0N5A0y7E2L76os+tlbstXxeX
/GFaP5ebxfPzpEU+CEzScmSWL/vVI5H3oS6LTHr8pZucYmrD2WO9YBOx74oE1asP
OT0IOsLmhpaEDcq62IarscW36S+nMYBx77T1wF18l0jgX9kr467P7Z7o1DFMYrJI
Xvyi3GzFx5Dh4IKDca34K3lXlFIo60VQqWEwpNW3byvxWyRzyOk=
=NNys
-----END PGP SIGNATURE-----

Show platform data

Tested on Linux-5.10.0-11-amd64-x86_64-with-glibc2.31

Configured with ./configure LDFLAGS=-L/home/james/src/bitcoin/db4/lib/ CPPFLAGS=-I/home/james/src/bitcoin/db4/include/ CXXFLAGS=-fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -Wsign-compare -Werror=thread-safety-analysis --enable-wallet --enable-debug --with-daemon --enable-natpmp-default CXX=/usr/local/bin/clang++ CC=/usr/local/bin/clang PYTHONPATH= --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --enable-module-schnorrsig --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --enable-module-schnorrsig --no-create --no-recursion

Compiled with /usr/bin/ccache /usr/local/bin/clang++ -std=c++17 -mavx -mavx2 -mpclmul -fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -Wsign-compare -Werror=thread-safety-analysis -O0 -g3 -ftrapv -fdebug-prefix-map=$(abs_top_srcdir)=.  -Wstack-protector -fstack-protector-all -fcf-protection=full -fstack-clash-protection -msse4.1 -msse4.2 -msse4 -msha  i

Compiler version: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63

}
if (ai_trav->ai_family == AF_INET6) {
assert(ai_trav->ai_addrlen >= sizeof(sockaddr_in6));
const sockaddr_in6* s6{reinterpret_cast<sockaddr_in6*>(ai_trav->ai_addr)};
resolved_addresses.emplace_back(s6->sin6_addr, s6->sin6_scope_id);
struct sockaddr_in6 s6 = LoadSockaddrIPv6(*ai_trav->ai_addr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be marked const.

@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft".

@vasild
Copy link
Contributor

vasild commented Jul 12, 2022

What about using placement new to change the dynamic type of the object?

Correct me if I am wrong, if we have:

struct S1 {
    int x;
};

struct S2 {
    int x;
};

S1 s1;
S2* s2_ptr = reinterpret_cast<S2*>(&s1);
s2_ptr->x = 5; // UB because the S1 object is accessed as S2

but the following is ok because it changes the dynamic type of the object to S2:

S1 s1;
S2* s2_ptr = new (&s1) S2;
s2_ptr->x = 5; // ok

Here is an alternative patch that uses that approach. It is much smaller because it just replaces e.g. (sockaddr_in*)foo with SockAddrCast<sockaddr_in>(foo):

[patch] placement new typecast
diff --git a/src/net.cpp b/src/net.cpp
index c37d90519c..b3265dc5dc 100644
--- a/src/net.cpp
+++ b/src/net.cpp
@@ -427,13 +427,13 @@ static CAddress GetBindAddress(const Sock& sock)
 {
     CAddress addr_bind;
     struct sockaddr_storage sockaddr_bind;
     socklen_t sockaddr_bind_len = sizeof(sockaddr_bind);
     if (sock.Get() != INVALID_SOCKET) {
         if (!sock.GetSockName((struct sockaddr*)&sockaddr_bind, &sockaddr_bind_len)) {
-            addr_bind.SetSockAddr((const struct sockaddr*)&sockaddr_bind);
+            addr_bind.SetSockAddr(sockaddr_bind);
         } else {
             LogPrintLevel(BCLog::NET, BCLog::Level::Warning, "getsockname failed\n");
         }
     }
     return addr_bind;
 }
@@ -920,13 +920,13 @@ void CConnman::AcceptConnection(const ListenSocket& hListenSocket) {
         if (nErr != WSAEWOULDBLOCK) {
             LogPrintf("socket error accept failed: %s\n", NetworkErrorString(nErr));
         }
         return;
     }
 
-    if (!addr.SetSockAddr((const struct sockaddr*)&sockaddr)) {
+    if (!addr.SetSockAddr(sockaddr)) {
         LogPrintLevel(BCLog::NET, BCLog::Level::Warning, "Unknown socket family\n");
     } else {
         addr = CAddress{MaybeFlipIPv6toCJDNS(addr), NODE_NONE};
     }
 
     const CAddress addr_bind{MaybeFlipIPv6toCJDNS(GetBindAddress(*sock)), NODE_NONE};
@@ -2056,13 +2056,13 @@ bool CConnman::BindListenPort(const CService& addrBind, bilingual_str& strError,
 {
     int nOne = 1;
 
     // Create socket for listening for incoming connections
     struct sockaddr_storage sockaddr;
     socklen_t len = sizeof(sockaddr);
-    if (!addrBind.GetSockAddr((struct sockaddr*)&sockaddr, &len))
+    if (!addrBind.GetSockAddr(&sockaddr, &len))
     {
         strError = strprintf(Untranslated("Bind address family for %s not supported"), addrBind.ToString());
         LogPrintLevel(BCLog::NET, BCLog::Level::Error, "%s\n", strError.original);
         return false;
     }
 
@@ -2151,21 +2151,19 @@ void Discover()
             if (ifa->ifa_addr == nullptr) continue;
             if ((ifa->ifa_flags & IFF_UP) == 0) continue;
             if (strcmp(ifa->ifa_name, "lo") == 0) continue;
             if (strcmp(ifa->ifa_name, "lo0") == 0) continue;
             if (ifa->ifa_addr->sa_family == AF_INET)
             {
-                struct sockaddr_in* s4 = (struct sockaddr_in*)(ifa->ifa_addr);
-                CNetAddr addr(s4->sin_addr);
+                CNetAddr addr(SockAddrCast<sockaddr_in>(ifa->ifa_addr)->sin_addr);
                 if (AddLocal(addr, LOCAL_IF))
                     LogPrintf("%s: IPv4 %s: %s\n", __func__, ifa->ifa_name, addr.ToString());
             }
             else if (ifa->ifa_addr->sa_family == AF_INET6)
             {
-                struct sockaddr_in6* s6 = (struct sockaddr_in6*)(ifa->ifa_addr);
-                CNetAddr addr(s6->sin6_addr);
+                CNetAddr addr(SockAddrCast<sockaddr_in6>(ifa->ifa_addr)->sin6_addr);
                 if (AddLocal(addr, LOCAL_IF))
                     LogPrintf("%s: IPv6 %s: %s\n", __func__, ifa->ifa_name, addr.ToString());
             }
         }
         freeifaddrs(myaddrs);
     }
diff --git a/src/netaddress.cpp b/src/netaddress.cpp
index ca148bfa51..b997dc6ee9 100644
--- a/src/netaddress.cpp
+++ b/src/netaddress.cpp
@@ -831,20 +831,20 @@ CService::CService(const struct sockaddr_in& addr) : CNetAddr(addr.sin_addr), po
 
 CService::CService(const struct sockaddr_in6 &addr) : CNetAddr(addr.sin6_addr, addr.sin6_scope_id), port(ntohs(addr.sin6_port))
 {
    assert(addr.sin6_family == AF_INET6);
 }
 
-bool CService::SetSockAddr(const struct sockaddr *paddr)
+bool CService::SetSockAddr(const sockaddr_storage& addr)
 {
-    switch (paddr->sa_family) {
+    switch (addr.ss_family) {
     case AF_INET:
-        *this = CService(*(const struct sockaddr_in*)paddr);
+        *this = CService(*SockAddrCast<sockaddr_in>(&addr));
         return true;
     case AF_INET6:
-        *this = CService(*(const struct sockaddr_in6*)paddr);
+        *this = CService(*SockAddrCast<sockaddr_in6>(&addr));
         return true;
     default:
         return false;
     }
 }
 
@@ -872,32 +872,32 @@ bool operator<(const CService& a, const CService& b)
  *                        parameter might change after calling this function if
  *                        the size of the corresponding address structure
  *                        changed.
  *
  * @returns Whether or not the operation was successful.
  */
-bool CService::GetSockAddr(struct sockaddr* paddr, socklen_t *addrlen) const
+bool CService::GetSockAddr(struct sockaddr_storage* paddr, socklen_t* addrlen) const
 {
+    memset(paddr, 0x0, *addrlen);
+
     if (IsIPv4()) {
         if (*addrlen < (socklen_t)sizeof(struct sockaddr_in))
             return false;
         *addrlen = sizeof(struct sockaddr_in);
-        struct sockaddr_in *paddrin = (struct sockaddr_in*)paddr;
-        memset(paddrin, 0, *addrlen);
+        sockaddr_in* paddrin = SockAddrCast<sockaddr_in>(paddr);
         if (!GetInAddr(&paddrin->sin_addr))
             return false;
         paddrin->sin_family = AF_INET;
         paddrin->sin_port = htons(port);
         return true;
     }
     if (IsIPv6() || IsCJDNS()) {
         if (*addrlen < (socklen_t)sizeof(struct sockaddr_in6))
             return false;
         *addrlen = sizeof(struct sockaddr_in6);
-        struct sockaddr_in6 *paddrin6 = (struct sockaddr_in6*)paddr;
-        memset(paddrin6, 0, *addrlen);
+        sockaddr_in6* paddrin6 = SockAddrCast<sockaddr_in6>(paddr);
         if (!GetIn6Addr(&paddrin6->sin6_addr))
             return false;
         paddrin6->sin6_scope_id = m_scope_id;
         paddrin6->sin6_family = AF_INET6;
         paddrin6->sin6_port = htons(port);
         return true;
diff --git a/src/netaddress.h b/src/netaddress.h
index 47ba045334..e286b93a41 100644
--- a/src/netaddress.h
+++ b/src/netaddress.h
@@ -525,14 +525,14 @@ protected:
 public:
     CService();
     CService(const CNetAddr& ip, uint16_t port);
     CService(const struct in_addr& ipv4Addr, uint16_t port);
     explicit CService(const struct sockaddr_in& addr);
     uint16_t GetPort() const;
-    bool GetSockAddr(struct sockaddr* paddr, socklen_t* addrlen) const;
-    bool SetSockAddr(const struct sockaddr* paddr);
+    bool GetSockAddr(sockaddr_storage* paddr, socklen_t* addrlen) const;
+    bool SetSockAddr(const sockaddr_storage& addr);
     friend bool operator==(const CService& a, const CService& b);
     friend bool operator!=(const CService& a, const CService& b) { return !(a == b); }
     friend bool operator<(const CService& a, const CService& b);
     std::vector<unsigned char> GetKey() const;
     std::string ToString() const;
     std::string ToStringPort() const;
@@ -573,7 +573,36 @@ public:
 
 private:
     const uint64_t m_salt_k0;
     const uint64_t m_salt_k1;
 };
 
+/**
+ * Cast sockaddr or sockaddr_storage to sockaddr_in or sockaddr_in6.
+ * This is as safe as the traditional typecast:
+ * @code{.cpp}
+ * sockaddr_storage from;
+ * sockaddr_in* to_ptr = (sockaddr_in*)&from;
+ * to_ptr->sin_family = ...
+ * @endcode
+ * except that it avoids the undefined behavior of the last line above by not
+ * breaking the strict aliasing rules.
+ * Placement new changes the dynamic type of the object to `To`, so that it is
+ * safe to access it as `To` afterwards.
+ */
+template <typename To, typename From>
+static inline To* SockAddrCast(const From* ptr)
+{
+    // Make sure nobody abuses this function for other types.
+    static_assert(
+        std::is_same<From, sockaddr>::value ||
+        std::is_same<From, sockaddr_storage>::value);
+    static_assert(
+        std::is_same<To, sockaddr_in>::value ||
+        std::is_same<To, sockaddr_in6>::value);
+    // Make sure that the placement new will not modify the memory pointed by `ptr`.
+    static_assert(std::is_trivially_default_constructible<To>::value);
+
+    return new (const_cast<From*>(ptr)) To;
+}
+
 #endif // BITCOIN_NETADDRESS_H
diff --git a/src/netbase.cpp b/src/netbase.cpp
index 030f462ed9..ca59230034 100644
--- a/src/netbase.cpp
+++ b/src/netbase.cpp
@@ -65,17 +65,17 @@ std::vector<CNetAddr> WrappedGetAddrInfo(const std::string& name, bool allow_loo
     // Traverse the linked list starting with ai_trav.
     addrinfo* ai_trav{ai_res};
     std::vector<CNetAddr> resolved_addresses;
     while (ai_trav != nullptr) {
         if (ai_trav->ai_family == AF_INET) {
             assert(ai_trav->ai_addrlen >= sizeof(sockaddr_in));
-            resolved_addresses.emplace_back(reinterpret_cast<sockaddr_in*>(ai_trav->ai_addr)->sin_addr);
+            resolved_addresses.emplace_back(SockAddrCast<sockaddr_in>(ai_trav->ai_addr)->sin_addr);
         }
         if (ai_trav->ai_family == AF_INET6) {
             assert(ai_trav->ai_addrlen >= sizeof(sockaddr_in6));
-            const sockaddr_in6* s6{reinterpret_cast<sockaddr_in6*>(ai_trav->ai_addr)};
+            const sockaddr_in6* s6 = SockAddrCast<sockaddr_in6>(ai_trav->ai_addr);
             resolved_addresses.emplace_back(s6->sin6_addr, s6->sin6_scope_id);
         }
         ai_trav = ai_trav->ai_next;
     }
     freeaddrinfo(ai_res);
 
@@ -485,13 +485,13 @@ bool Socks5(const std::string& strDest, uint16_t port, const ProxyCredentials* a
 
 std::unique_ptr<Sock> CreateSockTCP(const CService& address_family)
 {
     // Create a sockaddr from the specified service.
     struct sockaddr_storage sockaddr;
     socklen_t len = sizeof(sockaddr);
-    if (!address_family.GetSockAddr((struct sockaddr*)&sockaddr, &len)) {
+    if (!address_family.GetSockAddr(&sockaddr, &len)) {
         LogPrintf("Cannot create socket for %s: unsupported network\n", address_family.ToString());
         return nullptr;
     }
 
     // Create a TCP socket in the address family of the specified service.
     SOCKET hSocket = socket(((struct sockaddr*)&sockaddr)->sa_family, SOCK_STREAM, IPPROTO_TCP);
@@ -550,13 +550,13 @@ bool ConnectSocketDirectly(const CService &addrConnect, const Sock& sock, int nT
     struct sockaddr_storage sockaddr;
     socklen_t len = sizeof(sockaddr);
     if (sock.Get() == INVALID_SOCKET) {
         LogPrintf("Cannot connect to %s: invalid socket\n", addrConnect.ToString());
         return false;
     }
-    if (!addrConnect.GetSockAddr((struct sockaddr*)&sockaddr, &len)) {
+    if (!addrConnect.GetSockAddr(&sockaddr, &len)) {
         LogPrintf("Cannot connect to %s: unsupported network\n", addrConnect.ToString());
         return false;
     }
 
     // Connect to the addrConnect service on the hSocket socket.
     if (sock.Connect(reinterpret_cast<struct sockaddr*>(&sockaddr), len) == SOCKET_ERROR) {
diff --git a/src/test/util/net.h b/src/test/util/net.h
index 34ab9958c6..eb4c283cf8 100644
--- a/src/test/util/net.h
+++ b/src/test/util/net.h
@@ -132,13 +132,13 @@ public:
         if (addr != nullptr) {
             // Pretend all connections come from 5.5.5.5:6789
             memset(addr, 0x00, *addr_len);
             const socklen_t write_len = static_cast<socklen_t>(sizeof(sockaddr_in));
             if (*addr_len >= write_len) {
                 *addr_len = write_len;
-                sockaddr_in* addr_in = reinterpret_cast<sockaddr_in*>(addr);
+                sockaddr_in* addr_in = SockAddrCast<sockaddr_in>(addr);
                 addr_in->sin_family = AF_INET;
                 memset(&addr_in->sin_addr, 0x05, sizeof(addr_in->sin_addr));
                 addr_in->sin_port = htons(6789);
             }
         }
         return std::make_unique<StaticContentsSock>("");
diff --git a/src/torcontrol.cpp b/src/torcontrol.cpp
index d6e792a55f..c9b6fc289f 100644
--- a/src/torcontrol.cpp
+++ b/src/torcontrol.cpp
@@ -137,13 +137,13 @@ bool TorControlConnection::Connect(const std::string& tor_control_center, const
         LogPrintf("tor: Failed to look up control center %s\n", tor_control_center);
         return false;
     }
 
     struct sockaddr_storage control_address;
     socklen_t control_address_len = sizeof(control_address);
-    if (!control_service.GetSockAddr(reinterpret_cast<struct sockaddr*>(&control_address), &control_address_len)) {
+    if (!control_service.GetSockAddr(&control_address, &control_address_len)) {
         LogPrintf("tor: Error parsing socket address %s\n", tor_control_center);
         return false;
     }
 
     // Create a new socket, set up callbacks and enable notification bits
     b_conn = bufferevent_socket_new(base, -1, BEV_OPT_CLOSE_ON_FREE);

@vasild
Copy link
Contributor

vasild commented Jul 14, 2022

The remaining casts to sockaddr* are unavoidable and safe because we are calling into the external C APIs which expect us to do that cast.

I understand the "unavoidable" but I don't understand the "safe". Do you mean that it is safe just because it goes to external C API? I made the following experiment which shows the opposite (both with gcc 12 and clang 15):

  • s.h
struct S1 {
    int x;
};

struct S2 {
    int x;
};
  • f.c
#include "s.h"

int f(struct S1* s1, struct S2* s2)
{
    s1->x = 5;
    s2->x = 6;
    return s1->x;
}
  • main.cc
#include "s.h"

extern "C" int f(S1*, S2*);

int main(int, char**)
{
    S1 s1;
    std::cout << f(&s1, (S2*)&s1) << std::endl;
    return 0;
}

Compile with -O3 and 5 is printed. Change the type of the second argument to S1 and 6 is printed.

I think the bind(2) API, strictly speaking implies UB, but it is safe because the bind() function never does something like the above, nor we do before passing it the argument.

There is an example in bind(2) which interprets sockaddr_un as sockaddr - that is also UB according to the standard of strict aliasing rules1 2, right? What about memcpy() where we cast anything to void*, also UB?

@adamjonas
Copy link
Member

@Adlai-Holler mind rebasing?

@jonatack
Copy link
Member

jonatack commented Aug 5, 2022

Concept ACK on fixing UB here (and getting rid of these reinterpret_casts).

@achow101
Copy link
Member

Are you still working on this?

@achow101
Copy link
Member

Closing this as it has not had any activity in a while. If you are interested in continuing work on this, please leave a comment so that it can be reopened.

@achow101 achow101 closed this Nov 10, 2022
@bitcoin bitcoin locked and limited conversation to collaborators Mar 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Net code contains some UB -- violations of strict aliasing rules in C++
10 participants