Skip to content

Conversation

maflcko
Copy link
Member

@maflcko maflcko commented Apr 4, 2022

This is a refactor (with added test), needed for:

MarcoFalke added 2 commits December 15, 2021 11:23
Also, fix style in the corresponding function. The style change can be
reviewed with "--word-diff-regex=."
Copy link
Contributor

@shaavan shaavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK fad1b61

  • This PR:
    1. Add test for ParseHex function. Specifically, it adds tests for an embedded null (\0).
    2. Removes a redundant version of the ParseHex function cleaning the code.

These two commits are separated from #23595. While both the commits are refactoring changes, the changes done in the second commit are commonly shared between the PRs #23595 and #18061.

I agree with separating these two commits in a separate PR because:

  1. It removes the need of repeated efforts on a similar change in two different PRs.
  2. It separates refactoring changes from other subsequent changes done in the PRs.
  3. It makes the PRs more focused on a specific change, which aligns with the philosophy of atomic transitions.

The updated code is clean and easy to understand and reason with, and I think it can be merged.

@@ -54,7 +54,7 @@ enum class ByteUnit : uint64_t {
* @return A new string without unsafe chars
*/
std::string SanitizeString(const std::string& str, int rule = SAFE_CHARS_DEFAULT);
std::vector<unsigned char> ParseHex(const char* psz);
/** Parse the hex string into bytes. Ignores whitespace. */
std::vector<unsigned char> ParseHex(const std::string& str);
Copy link
Member

@laanwj laanwj Apr 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pessimization (causes a allocation and copy) if the passed-in parameter is actually a const char *. Maybe switch to a span or string_view? Or is this never the case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few places that use the pointer interface. However, they are not performance critical:

  • Unit tests
  • Calls, that are only done once (genesis block, other chain params, ...)
  • In net processing to generate the alert message

I think for none of these performance matter, because they are either (1) small strings, in which case a copy doesn't matter, (2) used in tests or otherwise places where performance isn't needed.

The places that are affected (RPC mostly), already use std::string to pass around hex stuff. For them this change has no effect, but likely ##18061 helps here. For example, when hex-decoding a 1-MB block or so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think switching to string_view could still make sense, but this requires a re-write of the parse-algorithm, as string view lacks the trailing null-char, which the current algorithm relies on. (There is no std::string_view::c_str() method)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I still think being able to pass in a memory area instead of a full, allocated string is a conceptually useful interface. const char* is just the wrong way to do it.

Copy link
Member

@laanwj laanwj Apr 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think switching to string_view could still make sense, but this requires a re-write of the parse-algorithm,

Fine, yes that may be too much for a refactor like this, let's not do it in this PR then (might as well solve that at the same time as the trailing null issue).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(unresolved conversation, to avoid hiding it)

@laanwj
Copy link
Member

laanwj commented Apr 4, 2022

Concept ACK on getting rid of const char * string interfaces.

Code review ACK fad1b61

Copy link

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK fad1b61

@sipa
Copy link
Member

sipa commented Apr 4, 2022

Using std::string_view would be ideal here, as constructing one of those from either std::string or const char* is efficient, and without allocation overhead.

@maflcko
Copy link
Member Author

maflcko commented Apr 4, 2022

@sipa This is ideal, but non-trivial IIUC. See the previous code discussion.

@sipa
Copy link
Member

sipa commented Apr 4, 2022

diff --git a/src/util/strencodings.cpp b/src/util/strencodings.cpp
index 940fa90da2..691607caba 100644
--- a/src/util/strencodings.cpp
+++ b/src/util/strencodings.cpp
@@ -81,32 +81,24 @@ bool IsHexNumber(const std::string& str)
     return (str.size() > starting_location);
 }
 
-std::vector<unsigned char> ParseHex(const char* psz)
+std::vector<unsigned char> ParseHex(std::string_view str)
 {
     // convert hex dump to vector
     std::vector<unsigned char> vch;
-    while (true)
-    {
-        while (IsSpace(*psz))
-            psz++;
-        signed char c = HexDigit(*psz++);
-        if (c == (signed char)-1)
-            break;
-        auto n{uint8_t(c << 4)};
-        c = HexDigit(*psz++);
-        if (c == (signed char)-1)
-            break;
-        n |= c;
-        vch.push_back(n);
+    auto it = str.begin();
+    while (it != str.end() && it + 1 != str.end()) {
+        if (IsSpace(*it)) {
+            ++it;
+            continue;
+        }
+        auto c1 = HexDigit(*(it++));
+        auto c2 = HexDigit(*(it++));
+        if (c1 == (signed char)-1 || c2 == (signed char)-1) break;
+        vch.push_back(uint8_t(c1 << 4) | c2);
     }
     return vch;
 }
 
-std::vector<unsigned char> ParseHex(const std::string& str)
-{
-    return ParseHex(str.c_str());
-}
-
 void SplitHostPort(std::string in, uint16_t& portOut, std::string& hostOut)
 {
     size_t colon = in.find_last_of(':');
diff --git a/src/util/strencodings.h b/src/util/strencodings.h
index 1f83fa3ffa..0e72466fc3 100644
--- a/src/util/strencodings.h
+++ b/src/util/strencodings.h
@@ -55,8 +55,7 @@ enum class ByteUnit : uint64_t {
 * @return           A new string without unsafe chars
 */
 std::string SanitizeString(const std::string& str, int rule = SAFE_CHARS_DEFAULT);
-std::vector<unsigned char> ParseHex(const char* psz);
-std::vector<unsigned char> ParseHex(const std::string& str);
+std::vector<unsigned char> ParseHex(std::string_view str);
 signed char HexDigit(char c);
 /* Returns true if each character in str is a hex character, and has an even
  * number of hex digits.*/

?

@maflcko
Copy link
Member Author

maflcko commented Apr 4, 2022

Happy to review, if you create a pull. With "trivial" I meant that this is not a one-line fix, but requires reworking the algorithm.

@sipa
Copy link
Member

sipa commented Apr 4, 2022

On it.

@sipa
Copy link
Member

sipa commented Apr 4, 2022

See #24764.

@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 5, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #24764 (Modernize util/strencodings and util/string: string_view and optional by sipa)
  • #22087 (Validate port-options by amadeuszpawlik)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@maflcko
Copy link
Member Author

maflcko commented Apr 5, 2022

Thanks, closing for now.

@maflcko maflcko closed this Apr 5, 2022
@maflcko maflcko deleted the 2204-no-pointer-💴 branch April 5, 2022 06:47
@DrahtBot DrahtBot mentioned this pull request Apr 6, 2022
@bitcoin bitcoin locked and limited conversation to collaborators Apr 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants