-
Notifications
You must be signed in to change notification settings - Fork 37.7k
refactor: Remove ParseHex(const char*) from public interface #24751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The head ref may contain hidden characters: "2204-no-pointer-\u{1F4B4}"
Conversation
Also, fix style in the corresponding function. The style change can be reviewed with "--word-diff-regex=."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review ACK fad1b61
- This PR:
- Add test for ParseHex function. Specifically, it adds tests for an embedded null (
\0
). - Removes a redundant version of the ParseHex function cleaning the code.
- Add test for ParseHex function. Specifically, it adds tests for an embedded null (
These two commits are separated from #23595. While both the commits are refactoring changes, the changes done in the second commit are commonly shared between the PRs #23595 and #18061.
I agree with separating these two commits in a separate PR because:
- It removes the need of repeated efforts on a similar change in two different PRs.
- It separates refactoring changes from other subsequent changes done in the PRs.
- It makes the PRs more focused on a specific change, which aligns with the philosophy of atomic transitions.
The updated code is clean and easy to understand and reason with, and I think it can be merged.
@@ -54,7 +54,7 @@ enum class ByteUnit : uint64_t { | |||
* @return A new string without unsafe chars | |||
*/ | |||
std::string SanitizeString(const std::string& str, int rule = SAFE_CHARS_DEFAULT); | |||
std::vector<unsigned char> ParseHex(const char* psz); | |||
/** Parse the hex string into bytes. Ignores whitespace. */ | |||
std::vector<unsigned char> ParseHex(const std::string& str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pessimization (causes a allocation and copy) if the passed-in parameter is actually a const char *
. Maybe switch to a span
or string_view
? Or is this never the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few places that use the pointer interface. However, they are not performance critical:
- Unit tests
- Calls, that are only done once (genesis block, other chain params, ...)
- In net processing to generate the alert message
I think for none of these performance matter, because they are either (1) small strings, in which case a copy doesn't matter, (2) used in tests or otherwise places where performance isn't needed.
The places that are affected (RPC mostly), already use std::string
to pass around hex stuff. For them this change has no effect, but likely ##18061 helps here. For example, when hex-decoding a 1-MB block or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think switching to string_view
could still make sense, but this requires a re-write of the parse-algorithm, as string view lacks the trailing null-char, which the current algorithm relies on. (There is no std::string_view::c_str()
method)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I still think being able to pass in a memory area instead of a full, allocated string is a conceptually useful interface. const char*
is just the wrong way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think switching to string_view could still make sense, but this requires a re-write of the parse-algorithm,
Fine, yes that may be too much for a refactor like this, let's not do it in this PR then (might as well solve that at the same time as the trailing null issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(unresolved conversation, to avoid hiding it)
Concept ACK on getting rid of Code review ACK fad1b61 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK fad1b61
Using |
@sipa This is ideal, but non-trivial IIUC. See the previous code discussion. |
diff --git a/src/util/strencodings.cpp b/src/util/strencodings.cpp
index 940fa90da2..691607caba 100644
--- a/src/util/strencodings.cpp
+++ b/src/util/strencodings.cpp
@@ -81,32 +81,24 @@ bool IsHexNumber(const std::string& str)
return (str.size() > starting_location);
}
-std::vector<unsigned char> ParseHex(const char* psz)
+std::vector<unsigned char> ParseHex(std::string_view str)
{
// convert hex dump to vector
std::vector<unsigned char> vch;
- while (true)
- {
- while (IsSpace(*psz))
- psz++;
- signed char c = HexDigit(*psz++);
- if (c == (signed char)-1)
- break;
- auto n{uint8_t(c << 4)};
- c = HexDigit(*psz++);
- if (c == (signed char)-1)
- break;
- n |= c;
- vch.push_back(n);
+ auto it = str.begin();
+ while (it != str.end() && it + 1 != str.end()) {
+ if (IsSpace(*it)) {
+ ++it;
+ continue;
+ }
+ auto c1 = HexDigit(*(it++));
+ auto c2 = HexDigit(*(it++));
+ if (c1 == (signed char)-1 || c2 == (signed char)-1) break;
+ vch.push_back(uint8_t(c1 << 4) | c2);
}
return vch;
}
-std::vector<unsigned char> ParseHex(const std::string& str)
-{
- return ParseHex(str.c_str());
-}
-
void SplitHostPort(std::string in, uint16_t& portOut, std::string& hostOut)
{
size_t colon = in.find_last_of(':');
diff --git a/src/util/strencodings.h b/src/util/strencodings.h
index 1f83fa3ffa..0e72466fc3 100644
--- a/src/util/strencodings.h
+++ b/src/util/strencodings.h
@@ -55,8 +55,7 @@ enum class ByteUnit : uint64_t {
* @return A new string without unsafe chars
*/
std::string SanitizeString(const std::string& str, int rule = SAFE_CHARS_DEFAULT);
-std::vector<unsigned char> ParseHex(const char* psz);
-std::vector<unsigned char> ParseHex(const std::string& str);
+std::vector<unsigned char> ParseHex(std::string_view str);
signed char HexDigit(char c);
/* Returns true if each character in str is a hex character, and has an even
* number of hex digits.*/ ? |
Happy to review, if you create a pull. With "trivial" I meant that this is not a one-line fix, but requires reworking the algorithm. |
On it. |
See #24764. |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Thanks, closing for now. |
This is a refactor (with added test), needed for: