refactor: Remove ParseHex(const char*) from public interface #24751

maflcko · 2022-04-04T06:37:52Z

This is a refactor (with added test), needed for:

util: Add ParseHex<std::byte>() helper #23595
util: Pass size to ParseHex to assist preallocation #18061
Some bugfixes I am working on, but are blocked on this

Also, fix style in the corresponding function. The style change can be reviewed with "--word-diff-regex=."

shaavan

Code Review ACK fad1b61

This PR:
1. Add test for ParseHex function. Specifically, it adds tests for an embedded null (\0).
2. Removes a redundant version of the ParseHex function cleaning the code.

These two commits are separated from #23595. While both the commits are refactoring changes, the changes done in the second commit are commonly shared between the PRs #23595 and #18061.

I agree with separating these two commits in a separate PR because:

It removes the need of repeated efforts on a similar change in two different PRs.
It separates refactoring changes from other subsequent changes done in the PRs.
It makes the PRs more focused on a specific change, which aligns with the philosophy of atomic transitions.

The updated code is clean and easy to understand and reason with, and I think it can be merged.

src/test/util_tests.cpp

laanwj · 2022-04-04T09:30:37Z

src/util/strencodings.h

@@ -54,7 +54,7 @@ enum class ByteUnit : uint64_t {
 * @return           A new string without unsafe chars
 */
 std::string SanitizeString(const std::string& str, int rule = SAFE_CHARS_DEFAULT);
-std::vector<unsigned char> ParseHex(const char* psz);
+/** Parse the hex string into bytes. Ignores whitespace. */
 std::vector<unsigned char> ParseHex(const std::string& str);


This is a pessimization (causes a allocation and copy) if the passed-in parameter is actually a const char *. Maybe switch to a span or string_view? Or is this never the case?

There are a few places that use the pointer interface. However, they are not performance critical:

Unit tests

Calls, that are only done once (genesis block, other chain params, ...)

In net processing to generate the alert message

I think for none of these performance matter, because they are either (1) small strings, in which case a copy doesn't matter, (2) used in tests or otherwise places where performance isn't needed.

The places that are affected (RPC mostly), already use std::string to pass around hex stuff. For them this change has no effect, but likely ##18061 helps here. For example, when hex-decoding a 1-MB block or so.

I think switching to string_view could still make sense, but this requires a re-write of the parse-algorithm, as string view lacks the trailing null-char, which the current algorithm relies on. (There is no std::string_view::c_str() method)

OK. I still think being able to pass in a memory area instead of a full, allocated string is a conceptually useful interface. const char* is just the wrong way to do it.

I think switching to string_view could still make sense, but this requires a re-write of the parse-algorithm,

Fine, yes that may be too much for a refactor like this, let's not do it in this PR then (might as well solve that at the same time as the trailing null issue).

(unresolved conversation, to avoid hiding it)

laanwj · 2022-04-04T09:37:13Z

Concept ACK on getting rid of const char * string interfaces.

Code review ACK fad1b61

vincenzopalazzo

ACK fad1b61

sipa · 2022-04-04T14:37:37Z

Using std::string_view would be ideal here, as constructing one of those from either std::string or const char* is efficient, and without allocation overhead.

maflcko · 2022-04-04T14:54:07Z

@sipa This is ideal, but non-trivial IIUC. See the previous code discussion.

sipa · 2022-04-04T15:04:23Z

diff --git a/src/util/strencodings.cpp b/src/util/strencodings.cpp
index 940fa90da2..691607caba 100644
--- a/src/util/strencodings.cpp
+++ b/src/util/strencodings.cpp
@@ -81,32 +81,24 @@ bool IsHexNumber(const std::string& str)
     return (str.size() > starting_location);
 }
 
-std::vector<unsigned char> ParseHex(const char* psz)
+std::vector<unsigned char> ParseHex(std::string_view str)
 {
     // convert hex dump to vector
     std::vector<unsigned char> vch;
-    while (true)
-    {
-        while (IsSpace(*psz))
-            psz++;
-        signed char c = HexDigit(*psz++);
-        if (c == (signed char)-1)
-            break;
-        auto n{uint8_t(c << 4)};
-        c = HexDigit(*psz++);
-        if (c == (signed char)-1)
-            break;
-        n |= c;
-        vch.push_back(n);
+    auto it = str.begin();
+    while (it != str.end() && it + 1 != str.end()) {
+        if (IsSpace(*it)) {
+            ++it;
+            continue;
+        }
+        auto c1 = HexDigit(*(it++));
+        auto c2 = HexDigit(*(it++));
+        if (c1 == (signed char)-1 || c2 == (signed char)-1) break;
+        vch.push_back(uint8_t(c1 << 4) | c2);
     }
     return vch;
 }
 
-std::vector<unsigned char> ParseHex(const std::string& str)
-{
-    return ParseHex(str.c_str());
-}
-
 void SplitHostPort(std::string in, uint16_t& portOut, std::string& hostOut)
 {
     size_t colon = in.find_last_of(':');
diff --git a/src/util/strencodings.h b/src/util/strencodings.h
index 1f83fa3ffa..0e72466fc3 100644
--- a/src/util/strencodings.h
+++ b/src/util/strencodings.h
@@ -55,8 +55,7 @@ enum class ByteUnit : uint64_t {
 * @return           A new string without unsafe chars
 */
 std::string SanitizeString(const std::string& str, int rule = SAFE_CHARS_DEFAULT);
-std::vector<unsigned char> ParseHex(const char* psz);
-std::vector<unsigned char> ParseHex(const std::string& str);
+std::vector<unsigned char> ParseHex(std::string_view str);
 signed char HexDigit(char c);
 /* Returns true if each character in str is a hex character, and has an even
  * number of hex digits.*/

?

maflcko · 2022-04-04T15:07:28Z

Happy to review, if you create a pull. With "trivial" I meant that this is not a one-line fix, but requires reworking the algorithm.

sipa · 2022-04-04T15:08:25Z

On it.

sipa · 2022-04-04T19:21:19Z

See #24764.

DrahtBot · 2022-04-05T01:04:53Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#24764 (Modernize util/strencodings and util/string: string_view and optional by sipa)
#22087 (Validate port-options by amadeuszpawlik)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

maflcko · 2022-04-05T06:47:31Z

Thanks, closing for now.

MarcoFalke added 2 commits December 15, 2021 11:23

test: Add test for embedded null in hex string

ffffa0d

Also, fix style in the corresponding function. The style change can be reviewed with "--word-diff-regex=."

refactor: Remove ParseHex(const char*) from public interface

fad1b61

maflcko added the Refactoring label Apr 4, 2022

maflcko mentioned this pull request Apr 4, 2022

util: Pass size to ParseHex to assist preallocation #18061

Closed

shaavan approved these changes Apr 4, 2022

View reviewed changes

laanwj reviewed Apr 4, 2022

View reviewed changes

src/test/util_tests.cpp Show resolved Hide resolved

laanwj reviewed Apr 4, 2022

View reviewed changes

vincenzopalazzo approved these changes Apr 4, 2022

View reviewed changes

DrahtBot mentioned this pull request Apr 4, 2022

Modernize util/strencodings and util/string: string_view and optional #24764

Closed

maflcko closed this Apr 5, 2022

maflcko deleted the 2204-no-pointer-💴 branch April 5, 2022 06:47

DrahtBot mentioned this pull request Apr 6, 2022

Validate port-options #22087

Merged

bitcoin locked and limited conversation to collaborators Apr 5, 2023

refactor: Remove ParseHex(const char*) from public interface #24751

refactor: Remove ParseHex(const char*) from public interface #24751

Uh oh!

Conversation

maflcko commented Apr 4, 2022

Uh oh!

shaavan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

laanwj Apr 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maflcko Apr 4, 2022

Choose a reason for hiding this comment

Uh oh!

maflcko Apr 4, 2022

Choose a reason for hiding this comment

Uh oh!

laanwj Apr 4, 2022

Choose a reason for hiding this comment

Uh oh!

laanwj Apr 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maflcko Apr 4, 2022

Choose a reason for hiding this comment

Uh oh!

laanwj commented Apr 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincenzopalazzo left a comment

Choose a reason for hiding this comment

Uh oh!

sipa commented Apr 4, 2022

Uh oh!

maflcko commented Apr 4, 2022

Uh oh!

sipa commented Apr 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maflcko commented Apr 4, 2022

Uh oh!

sipa commented Apr 4, 2022

Uh oh!

sipa commented Apr 4, 2022

Uh oh!

DrahtBot commented Apr 5, 2022

Conflicts

Uh oh!

maflcko commented Apr 5, 2022

Uh oh!

Uh oh!

laanwj Apr 4, 2022 •

edited

Loading

laanwj Apr 4, 2022 •

edited

Loading

laanwj commented Apr 4, 2022 •

edited

Loading

sipa commented Apr 4, 2022 •

edited

Loading