-
Notifications
You must be signed in to change notification settings - Fork 37.7k
util: Pass size to ParseHex to assist preallocation #18061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
strlen
strlen
twice
ut: i think the compiler would optimize strlen being called twice into once |
@sanjaykdragon good question. I'll check godbolt later. But even if you're right it's A. compiler dependent. B. still one time more than this. EDIT: even with simplification, and force inlining everything it still calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what about the check done in ValidAsCString
?
I think you need to show improvements otherwise it doesn't pay off. FWIW having ValidAsCString
avoids .reserve()
.
src/util/strencodings.cpp
Outdated
@@ -104,7 +106,7 @@ std::vector<unsigned char> ParseHex(const char* psz) | |||
|
|||
std::vector<unsigned char> ParseHex(const std::string& str) | |||
{ | |||
return ParseHex(str.c_str()); | |||
return ParseHex(str.c_str(), str.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NACK this change. Instead you can drop ParseHex(const char*)
and move the code here with the proper changes - I've done it and make check
ran fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we have places where it's called with const char*
. probably by dropping that it just calls the std::string constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion: can't we remove the std::vector<unsigned char> ParseHex(const char* psz);
variant completely?
Does this cause any actual issues if we did?
I'd like to move away from C-strings where possible and it seems more of a simplification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only "problem" is a degregation in speed for places where we call this with C strings which I think is mostly(if not only?) tests(their raw strings usage)
It's probably worth the simplicity though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is copying a string actually relevant performance wise here? Even if it was, parsing hex should only happen on user or rpc inputs, so I don't think we need to over-optimize this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoFalke in my benchmarks if you have char*
and you convert it into std::string
it spends almost the same time as the code before this PR (X2 slower). I think it's not about copying, it's about heap allocation.
But if you prefer I'll be fine with removing the raw char*
support because we probably only ever benefit from it in tests.
I agree that there's no need to over-optimize this but this is basically free optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Echoing the request of simply dropping ParseHex(const char* psz)
.
Generally I think we should try move away from C-strings where possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elichai Is there a single call site that needs the raw string interface? If not, and you have performance concerns, you may explicitly delete the interface, and thus force all callers to explicitly call ParseHex(std::string{psz})
and be aware of the performance impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked now, it seems like ParseHex(const char* psz)
is only used in tests except for 2 places:
https://github.com/bitcoin/bitcoin/blob/master/src/chainparams.cpp#L55
https://github.com/bitcoin/bitcoin/blob/master/src/net_processing.cpp#L2101
Both should almost never really run, so I'll just drop it
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
3fad849
to
349816a
Compare
So added some benchmarks, and it seems to be within the margin of noise except for
After:
|
349816a
to
42af035
Compare
strlen
twice
utACK: if these numbers are real and true, this is a nice improvement |
42af035
to
40aa036
Compare
🐙 This pull request conflicts with the target branch and needs rebase. Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft". |
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
1 similar comment
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
Added "up for grabs", see also https://github.com/bitcoin/bitcoin/pull/23595/commits |
I've split up the common changes (base refactoring work) into #24751 |
Closing this in favour of #24764 for now. |
I don't think this was fixed by #24764 |
I closed it because the PR has been inactive for a long time, the up for grabs label had already been applied, and it was just generating noise by conflicting with on-going changes. If someone wants to continue these changes they could rebase, and open a new PR, with the changes that are still relevant. |
Hi,
This is kinda chasing Concept ACK if it's worth the review effort.* (noticed it while reviewing a PR)Right now both DecodeBase32 and DecodeBase64 are only ever being called withstd::string
but still process viachar*
and so they runstrlen
(O(n)) twice, once to validate it doesn't contain random zeros(ValidAsCString
) another to get the length.Instead we can iterate it as a std::string and then if we see0
in the middle of the string we can fail the process.* if it is, there's also DecodeBase58 .And as a by-product saw 2 other places where we can pre-allocate memory.
EDIT: So the by-product turned out to be the only interesting part of this PR :) (see #18061 (comment))
According to the benchmarks this can cut the time it takes to parse a hex by half.