Skip to content

Conversation

maflcko
Copy link
Member

@maflcko maflcko commented Jun 26, 2023

The AsBytes helpers (or std::as_bytes helpers) are architecture
dependent. For example, the below test case diff [0], applied before
this commit will pass on x86_64, but fail on s390x [1].

Fix this by replacing AsBytes with a new ByteSpanCast in the MakeByteSpan helper.
This will turn the test case diff into a compile failure instead of a runtime error.

[0] test case diff:

diff --git a/src/test/serialize_tests.cpp b/src/test/serialize_tests.cpp
index 09f77d2b61..4e56ffa351 100644
--- a/src/test/serialize_tests.cpp
+++ b/src/test/serialize_tests.cpp
@@ -241,6 +241,15 @@ BOOST_AUTO_TEST_CASE(class_methods)
     ss2 << intval << boolval << stringval << charstrval << txval;
     ss2 >> methodtest3;
     BOOST_CHECK(methodtest3 == methodtest4);
+
+    {
+        DataStream out;
+        std::vector<uint8_t> in_8{0x00, 0x11, 0x22, 0x33};
+        std::vector<uint16_t> in_16{0xaabb, 0xccdd};
+        out.write(MakeByteSpan(in_8));
+        out.write(MakeByteSpan(in_16));
+        BOOST_CHECK_EQUAL(HexStr(out), "00112233bbaaddcc");
+    }
 }

 BOOST_AUTO_TEST_SUITE_END()

[1] test case failure on s390x:
test/serialize_tests.cpp(251): error: in "serialize_tests/class_methods": check HexStr(out) == "00112233bbaaddcc" has failed [00112233aabbccdd != 00112233bbaaddcc]

@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 26, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #27479 (BIP324: ElligatorSwift integrations by sipa)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot DrahtBot changed the title util: Safer MakeByteSpan with ByteSpanCast util: Safer MakeByteSpan with ByteSpanCast Jun 26, 2023
@ryanofsky
Copy link
Contributor

fad126d seems like a nice improvement. It makes existing MakeByteSpan function safer by only allowing it to be called on arrays of characters, and not array of other things like uint16_t that could have platform specific representations. Some thoughts though:

  • MakeByteSpan probably deserves to have some documentation since it is one of the more useful function in that file and is called a lot of places.
  • I don't like introducing the ByteSpanCast function. It doesn't seem useful except as an implementation detail, and I think it'd be nicer if we got rid of some confusing sounding functions instead of adding another one.
  • I don't like adding another use of the AsBytePtr function. As mentioned in util: Allow std::byte and char Span serialization #27927 I think AsBytePtr is unsafe, and less clear than reinterpret_cast.

Would suggest tweaking the PR as follows:

--- a/src/span.h
+++ b/src/span.h
@@ -270,25 +270,28 @@ inline const unsigned char* UCharCast(const std::byte* c) { return reinterpret_c
 
 // Helper function to safely convert a Span to a Span<[const] unsigned char>.
 template <typename T> constexpr auto UCharSpanCast(Span<T> s) -> Span<typename std::remove_pointer<decltype(UCharCast(s.data()))>::type> { return {UCharCast(s.data()), s.size()}; }
-// Helper function to safely convert a Span to a Span<[const] std::byte>.
-template <typename B>
-auto ByteSpanCast(Span<B> s)
-{
-    return Span{AsBytePtr(UCharCast(s.data())), s.size_bytes()}; // Use UCharCast to enforce B is a byte-like type.
-}
 
 /** Like the Span constructor, but for (const) unsigned char member types only. Only works for (un)signed char containers. */
 template <typename V> constexpr auto MakeUCharSpan(V&& v) -> decltype(UCharSpanCast(Span{std::forward<V>(v)})) { return UCharSpanCast(Span{std::forward<V>(v)}); }
 
+// Helper functions to turn arrays of characters into std::byte spans. Usable on
+// C++ std::string, std::string_view, and std::vector<char> objects, as well as
+// on C string literals and C fixed-size char[] arrays.
+//
+// Note that when constructing a span directly from a C string literal, the span
+// will include the \0 null byte. This can be avoided by constructing the span
+// indirectly through std::string_view.
 template <typename V>
 Span<const std::byte> MakeByteSpan(V&& v) noexcept
 {
-    return ByteSpanCast(Span{std::forward<V>(v)});
+    Span data{std::forward<V>(v)};
+    return AsBytes(Span{UCharCast(data.data()), data.size_bytes()});
 }
 template <typename V>
 Span<std::byte> MakeWritableByteSpan(V&& v) noexcept
 {
-    return ByteSpanCast(Span{std::forward<V>(v)});
+    Span data{std::forward<V>(v)};
+    return AsWritableBytes(Span{UCharCast(data.data()), data.size_bytes()});
 }
 
 #endif // BITCOIN_SPAN_H

@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

@maflcko
Copy link
Member Author

maflcko commented Jun 27, 2023

Maybe MakeByteSpan should just be removed?

  • Currently it (as well as Span) accepts a range that is not borrowed (see https://en.cppreference.com/w/cpp/ranges/borrowed_range), leading to potential lifetime issues. Someone should check if making the Span deduction guidelines safer will automatically make the MakeByteSpan function safer.
  • Given that std::as_bytes is in the standard library and we can't prevent devs from using it (or AsBytes) at compile time, it seems pointless trying to offer a safe MakeByteSpan wrapper with only a docstring suggesting to use it. It may be better to just write a clang-tidy plugin to review the code to disallow calling AsBytes on a span of uint16_t (or similar)?

@maflcko maflcko marked this pull request as draft June 27, 2023 07:43
@ryanofsky
Copy link
Contributor

ryanofsky commented Jun 27, 2023

Maybe MakeByteSpan should just be removed?

I don't think I agree. MakeByteSpan(x) is just equivalent to AsBytes(Span{x}) currently, so right now it is pointless. But if we go ahead with this PR and restrict it to only work on character arrays, that will make it safer and actually useful.

it seems pointless trying to offer a safe MakeByteSpan wrapper with only a docstring suggesting to use it.

I don't think it's pointless to have a safe function even though a dangerous alternative exists. If I'm writing or reviewing code that uses a safe implementation of MakeByteSpan I can be more confident it is doing the right thing than if I see code using AsBytes. Just like if I'm reviewing code that is using UCharCast I can be more confident in it than code using an (unsigned char) cast.

I think it would be a good thing to discourage use of AsBytes and provide some safer alternative where possible. That alternative doesn't have to be MakeByteSpan, but if we have a MakeByteSpan function already and can document it and make it safer, I think it'd be good to do that.

@maflcko
Copy link
Member Author

maflcko commented Jun 27, 2023

If I'm writing or reviewing code that uses a safe implementation of MakeByteSpan I can be more confident it is doing the right thing than if I see code using AsBytes.

Would you be less confident if there was a clang-tidy plugin doing the check? For example, instead of having two functions that do a similar thing, but it is unclear which one is safer/preferred from just looking at the name, we could have an additional UnsafeAsBytes (or similar), where it is clear that using it without looking up what it does is unsafe.

@ryanofsky
Copy link
Contributor

If I'm writing or reviewing code that uses a safe implementation of MakeByteSpan I can be more confident it is doing the right thing than if I see code using AsBytes.

Would you be less confident if there was a clang-tidy plugin doing the check? For example, instead of having two functions that do a similar thing, but it is unclear which one is safer/preferred from just looking at the name, we could have an additional UnsafeAsBytes (or similar), where it is clear that using it without looking up what it does is unsafe.

I think that sounds fine, though I'm not sure what the clang plugin adds exactly. If the goal is just to forbid AsBytes from being called on non-char arrays, we could change the definition of AsBytes to do that (probably using UCharCast internally), and provide an additional UnsafeAsBytes escape hatch for the cases where that doesn't work. But maybe a clang plugin could be useful to prevent std::as_bytes calls?.

I just think as long as we have a MakeByteSpan function and can make it safer, it seems good to do that. Or at least that doing that would be better than getting rid of MakeByteSpan and replacing it with calls to AsBytes when AsBytes is not very safe right now. But if there's something that can we can do to make AsBytes safer, I'd agree there be little use for a safer MakeByteSpan

@sipa
Copy link
Member

sipa commented Jun 27, 2023

FWIW, I don't believe AsBytes (or std::as_bytes) are ever undefined behavior; every object has a memory representation, which you're allowed to observe. It is of course always implementation-defined behavior when anything but byte and byte-like arrays are involved, but the point of these functions is allowing access to that.

@ryanofsky
Copy link
Contributor

FWIW, I don't believe AsBytes (or std::as_bytes) are ever undefined behavior

Yes that should be true as long as span is pointing to live memory. I should replace "platform specific or undefined" with "implementation specific" in the #27978 change description

Copy link

@PRADACANDI18 PRADACANDI18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General feed back have I have not given approval just yet on a merge. I have to review some more

MarcoFalke added 2 commits June 29, 2023 10:19
This removes bloat that is not needed.
The AsBytes helpers (or std::as_bytes helpers) are architecture
dependent. For example, the below test case diff [0], applied before
this commit will pass on x86_64, but fail on s390x [1].

Fix this by replacing AsBytes with a new ByteSpanCast where possible.

[0] test case diff:

```
diff --git a/src/test/serialize_tests.cpp b/src/test/serialize_tests.cpp
index 09f77d2..4e56ffa351 100644
--- a/src/test/serialize_tests.cpp
+++ b/src/test/serialize_tests.cpp
@@ -241,6 +241,15 @@ BOOST_AUTO_TEST_CASE(class_methods)
     ss2 << intval << boolval << stringval << charstrval << txval;
     ss2 >> methodtest3;
     BOOST_CHECK(methodtest3 == methodtest4);
+
+    {
+        DataStream out;
+        std::vector<uint8_t> in_8{0x00, 0x11, 0x22, 0x33};
+        std::vector<uint16_t> in_16{0xaabb, 0xccdd};
+        out.write(MakeByteSpan(in_8));
+        out.write(MakeByteSpan(in_16));
+        BOOST_CHECK_EQUAL(HexStr(out), "00112233bbaaddcc");
+    }
 }

 BOOST_AUTO_TEST_SUITE_END()
```

[1] test case failure on s390x:
test/serialize_tests.cpp(251): error: in "serialize_tests/class_methods": check HexStr(out) == "00112233bbaaddcc" has failed [00112233aabbccdd != 00112233bbaaddcc]
@maflcko
Copy link
Member Author

maflcko commented Jun 29, 2023

Closing for now to allow for more brainstorming

@maflcko maflcko closed this Jun 29, 2023
@maflcko maflcko deleted the 2306-byte-span- branch June 29, 2023 08:27
achow101 added a commit that referenced this pull request Jun 29, 2023
7c85361 refactor: Drop unsafe AsBytePtr function (Ryan Ofsky)

Pull request description:

  Replace calls to `AsBytePtr` with calls to `AsBytes` or `reinterpret_cast`. `AsBytePtr` is just a wrapper around `reinterpret_cast`. It accepts any type of pointer as an argument and uses `reinterpret_cast` to cast the argument to a `std::byte` pointer.

  Despite taking any type of pointer as an argument, it is not useful to call `AsBytePtr` on most types of pointers, because byte representations of most types will be platform specific or undefined. Also, because it is named similarly to the `AsBytes` function, `AsBytePtr` looks safer than it actually is. Both `AsBytes` and `AsBytePtr` call reinterpret_cast internally and may be unsafe to use with certain types, but AsBytes at least has some type checking and can only be called on `Span` objects, while `AsBytePtr` can be called on any pointer argument.

  The change was motivated by discussion on #27973 and #27927 and is compatible with those PRs

ACKs for top commit:
  jonatack:
    re-ACK 7c85361
  sipa:
    utACK 7c85361
  achow101:
    ACK 7c85361

Tree-SHA512: 200d858b1d4d579f081a7f9a14d488a99713b4918b4564ac3dd5c18578d927dbd6426e62e02f49f04a3fa73ca02ff7109c495cb0b92bec43c27d9b74e2f95757
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jun 30, 2023
7c85361 refactor: Drop unsafe AsBytePtr function (Ryan Ofsky)

Pull request description:

  Replace calls to `AsBytePtr` with calls to `AsBytes` or `reinterpret_cast`. `AsBytePtr` is just a wrapper around `reinterpret_cast`. It accepts any type of pointer as an argument and uses `reinterpret_cast` to cast the argument to a `std::byte` pointer.

  Despite taking any type of pointer as an argument, it is not useful to call `AsBytePtr` on most types of pointers, because byte representations of most types will be platform specific or undefined. Also, because it is named similarly to the `AsBytes` function, `AsBytePtr` looks safer than it actually is. Both `AsBytes` and `AsBytePtr` call reinterpret_cast internally and may be unsafe to use with certain types, but AsBytes at least has some type checking and can only be called on `Span` objects, while `AsBytePtr` can be called on any pointer argument.

  The change was motivated by discussion on bitcoin#27973 and bitcoin#27927 and is compatible with those PRs

ACKs for top commit:
  jonatack:
    re-ACK 7c85361
  sipa:
    utACK 7c85361
  achow101:
    ACK 7c85361

Tree-SHA512: 200d858b1d4d579f081a7f9a14d488a99713b4918b4564ac3dd5c18578d927dbd6426e62e02f49f04a3fa73ca02ff7109c495cb0b92bec43c27d9b74e2f95757
@bitcoin bitcoin locked and limited conversation to collaborators Jun 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants