util: Pass size to ParseHex to assist preallocation #18061

elichai · 2020-02-03T19:02:48Z

Hi,
~~This is kinda chasing Concept ACK if it's worth the review effort.* (noticed it while reviewing a PR)~~

Right now both DecodeBase32 and DecodeBase64 are only ever being called with std::string but still process via char* and so they run strlen(O(n)) twice, once to validate it doesn't contain random zeros(ValidAsCString) another to get the length.

~~Instead we can iterate it as a std::string and then if we see 0 in the middle of the string we can fail the process.~~

* if it is, there's also DecodeBase58 .

And as a by-product saw 2 other places where we can pre-allocate memory.

EDIT: So the by-product turned out to be the only interesting part of this PR :) (see #18061 (comment))
According to the benchmarks this can cut the time it takes to parse a hex by half.

sanjaykdragon · 2020-02-03T19:13:02Z

ut: i think the compiler would optimize strlen being called twice into once

elichai · 2020-02-03T19:15:26Z

@sanjaykdragon good question. I'll check godbolt later. But even if you're right it's A. compiler dependent. B. still one time more than this.

EDIT: even with simplification, and force inlining everything it still calls strlen twice https://godbolt.org/z/ZBpdWa

promag

~~But what about the check done in ValidAsCString?~~

I think you need to show improvements otherwise it doesn't pay off. FWIW having ValidAsCString avoids .reserve().

promag · 2020-02-03T22:22:04Z

src/util/strencodings.cpp

@@ -104,7 +106,7 @@ std::vector<unsigned char> ParseHex(const char* psz)

 std::vector<unsigned char> ParseHex(const std::string& str)
 {
-    return ParseHex(str.c_str());
+    return ParseHex(str.c_str(), str.size());


3fad849

NACK this change. Instead you can drop ParseHex(const char*) and move the code here with the proper changes - I've done it and make check ran fine.

Well we have places where it's called with const char*. probably by dropping that it just calls the std::string constructor

Same suggestion: can't we remove the std::vector<unsigned char> ParseHex(const char* psz); variant completely?
Does this cause any actual issues if we did?
I'd like to move away from C-strings where possible and it seems more of a simplification.

The only "problem" is a degregation in speed for places where we call this with C strings which I think is mostly(if not only?) tests(their raw strings usage)
It's probably worth the simplicity though.

Is copying a string actually relevant performance wise here? Even if it was, parsing hex should only happen on user or rpc inputs, so I don't think we need to over-optimize this.

@MarcoFalke in my benchmarks if you have char* and you convert it into std::string it spends almost the same time as the code before this PR (X2 slower). I think it's not about copying, it's about heap allocation.
But if you prefer I'll be fine with removing the raw char* support because we probably only ever benefit from it in tests.

I agree that there's no need to over-optimize this but this is basically free optimization.

Echoing the request of simply dropping ParseHex(const char* psz).

Generally I think we should try move away from C-strings where possible.

@elichai Is there a single call site that needs the raw string interface? If not, and you have performance concerns, you may explicitly delete the interface, and thus force all callers to explicitly call ParseHex(std::string{psz}) and be aware of the performance impact.

Checked now, it seems like ParseHex(const char* psz) is only used in tests except for 2 places:
https://github.com/bitcoin/bitcoin/blob/master/src/chainparams.cpp#L55
https://github.com/bitcoin/bitcoin/blob/master/src/net_processing.cpp#L2101

Both should almost never really run, so I'll just drop it

DrahtBot · 2020-02-03T23:43:41Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

net, refactor: pass uint16 CService::port as uint16 #21328 (net, refactor: pass uint16 CService::port as uint16 by jonatack)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

elichai · 2020-02-04T15:37:12Z

So added some benchmarks, and it seems to be within the margin of noise except for ParseHex.(which is twice as fast for the 32 byte case) so I think I'll rebase to only do the ParseHex pre-allocation
Before:

$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
# Benchmark,    evals,   iterations,  total,      min,         max,       median
Base64Decode,    5,       1000000,   1.77748,  3.49936e-07, 3.60382e-07, 3.55597e-07
Base64DecodeVec, 5,       1000000,   1.69569,  3.35597e-07, 3.41911e-07, 3.39301e-07
HexParse,        5,       1000000,   0.599282, 1.1813e-07,  1.2082e-07,  1.19942e-07

$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
# Benchmark,    evals,   iterations,  total,      min,         max,       median
Base64Decode,    5,       1000000,   1.84349,  3.6175e-07,  3.79702e-07, 3.65207e-07
Base64DecodeVec, 5,       1000000,   1.69212,  3.35303e-07, 3.43094e-07, 3.36332e-07
HexParse,        5,       1000000,   0.592378, 1.17877e-07, 1.19108e-07, 1.18541e-07

$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
# Benchmark,    evals,   iterations,  total,      min,         max,       median
Base64Decode,    5,       1000000,   1.84158,  3.59822e-07, 3.83746e-07, 3.67486e-07
Base64DecodeVec, 5,       1000000,   1.7276,   3.37973e-07, 3.51833e-07, 3.46255e-07
HexParse,        5,       1000000,   0.614073, 1.18379e-07, 1.26827e-07, 1.22456e-07

After:

# Benchmark,    evals,   iterations,  total,      min,         max,       median
Base64Decode,    5,       1000000,   1.81012,  3.55877e-07, 3.68592e-07, 3.63113e-07
Base64DecodeVec, 5,       1000000,   1.84868,  3.66254e-07, 3.72895e-07, 3.69377e-07
HexParse,        5,       1000000,   0.281846, 5.55183e-08, 5.71883e-08, 5.63974e-08

$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
# Benchmark,    evals,   iterations,  total,      min,         max,       median
Base64Decode,    5,       1000000,   1.82559, 3.58871e-07, 3.68576e-07, 3.65803e-07
Base64DecodeVec, 5,       1000000,   1.84521, 3.6735e-07,  3.71235e-07, 3.68715e-07
HexParse,        5,       1000000,   0.29672, 5.62958e-08, 6.46866e-08, 5.86538e-08

$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
# Benchmark,    evals,   iterations,  total,      min,         max,       median
Base64Decode,    5,       1000000,   1.89398,  3.7271e-07,  3.83227e-07, 3.78387e-07
Base64DecodeVec, 5,       1000000,   1.90284,  3.73173e-07, 3.89818e-07, 3.81565e-07
HexParse,        5,       1000000,   0.288767, 5.70536e-08, 5.92714e-08, 5.73934e-08

sanjaykdragon · 2020-02-04T17:05:10Z

So added some benchmarks, and it seems to be within the margin of noise except for ParseHex.(which is twice as fast for the 32 byte case) so I think I'll rebase to only do the ParseHex pre-allocation
Before:

$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
# Benchmark,    evals,   iterations,  total,      min,         max,       median
Base64Decode,    5,       1000000,   1.77748,  3.49936e-07, 3.60382e-07, 3.55597e-07
Base64DecodeVec, 5,       1000000,   1.69569,  3.35597e-07, 3.41911e-07, 3.39301e-07
HexParse,        5,       1000000,   0.599282, 1.1813e-07,  1.2082e-07,  1.19942e-07

$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
# Benchmark,    evals,   iterations,  total,      min,         max,       median
Base64Decode,    5,       1000000,   1.84349,  3.6175e-07,  3.79702e-07, 3.65207e-07
Base64DecodeVec, 5,       1000000,   1.69212,  3.35303e-07, 3.43094e-07, 3.36332e-07
HexParse,        5,       1000000,   0.592378, 1.17877e-07, 1.19108e-07, 1.18541e-07

$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
# Benchmark,    evals,   iterations,  total,      min,         max,       median
Base64Decode,    5,       1000000,   1.84158,  3.59822e-07, 3.83746e-07, 3.67486e-07
Base64DecodeVec, 5,       1000000,   1.7276,   3.37973e-07, 3.51833e-07, 3.46255e-07
HexParse,        5,       1000000,   0.614073, 1.18379e-07, 1.26827e-07, 1.22456e-07

After:

# Benchmark,    evals,   iterations,  total,      min,         max,       median
Base64Decode,    5,       1000000,   1.81012,  3.55877e-07, 3.68592e-07, 3.63113e-07
Base64DecodeVec, 5,       1000000,   1.84868,  3.66254e-07, 3.72895e-07, 3.69377e-07
HexParse,        5,       1000000,   0.281846, 5.55183e-08, 5.71883e-08, 5.63974e-08

$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
# Benchmark,    evals,   iterations,  total,      min,         max,       median
Base64Decode,    5,       1000000,   1.82559, 3.58871e-07, 3.68576e-07, 3.65803e-07
Base64DecodeVec, 5,       1000000,   1.84521, 3.6735e-07,  3.71235e-07, 3.68715e-07
HexParse,        5,       1000000,   0.29672, 5.62958e-08, 6.46866e-08, 5.86538e-08

$ ./src/bench/bench_bitcoin -filter="Base64Decode|Base64DecodeVec|HexParse"
# Benchmark,    evals,   iterations,  total,      min,         max,       median
Base64Decode,    5,       1000000,   1.89398,  3.7271e-07,  3.83227e-07, 3.78387e-07
Base64DecodeVec, 5,       1000000,   1.90284,  3.73173e-07, 3.89818e-07, 3.81565e-07
HexParse,        5,       1000000,   0.288767, 5.70536e-08, 5.92714e-08, 5.73934e-08

utACK: if these numbers are real and true, this is a nice improvement

DrahtBot · 2021-03-19T21:15:48Z

🐙 This pull request conflicts with the target branch and needs rebase.

_{Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft".}

DrahtBot · 2021-12-15T11:22:11Z

There hasn't been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

DrahtBot · 2022-03-21T13:07:36Z

There hasn't been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

maflcko · 2022-03-21T13:55:32Z

Added "up for grabs", see also https://github.com/bitcoin/bitcoin/pull/23595/commits

maflcko · 2022-04-04T06:41:27Z

I've split up the common changes (base refactoring work) into #24751

fanquake · 2022-04-26T12:21:19Z

Closing this in favour of #24764 for now.

maflcko · 2022-04-29T09:05:28Z

I don't think this was fixed by #24764

fanquake · 2022-04-29T10:23:31Z

I don't think this was fixed by #24764

I closed it because the PR has been inactive for a long time, the up for grabs label had already been applied, and it was just generating noise by conflicting with on-going changes. If someone wants to continue these changes they could rebase, and open a new PR, with the changes that are still relevant.

elichai changed the title ~~util: Reimplement DecodeBase32/64 with std::string to remove strlen~~ util: Change DecodeBase32/64 to use std::string to not call strlen twice Feb 3, 2020

DrahtBot added GUI P2P RPC/REST/ZMQ Utils/log/libs labels Feb 3, 2020

promag reviewed Feb 3, 2020

View reviewed changes

maflcko added Refactoring and removed GUI P2P RPC/REST/ZMQ labels Feb 4, 2020

elichai force-pushed the 2020-02-remove-strlen branch from 3fad849 to 349816a Compare February 4, 2020 15:13

Add benchmarks for ParseHex and Base64

a7d2c63

elichai force-pushed the 2020-02-remove-strlen branch from 349816a to 42af035 Compare February 4, 2020 15:56

elichai changed the title ~~util: Change DecodeBase32/64 to use std::string to not call strlen twice~~ util: Pass size to ParseHex to assist preallocation Feb 4, 2020

Add memory reserving in ParseHex and SanitizeString

40aa036

elichai force-pushed the 2020-02-remove-strlen branch from 42af035 to 40aa036 Compare February 12, 2020 16:01

DrahtBot mentioned this pull request Apr 29, 2020

bench: Add logging benchmark #18815

Merged

DrahtBot mentioned this pull request Mar 2, 2021

net, refactor: pass uint16 CService::port as uint16 #21328

Merged

DrahtBot added the Needs rebase label Mar 19, 2021

maflcko added the Up for grabs label Mar 21, 2022

maflcko mentioned this pull request Apr 4, 2022

refactor: Remove ParseHex(const char*) from public interface #24751

Closed

fanquake closed this Apr 26, 2022

fanquake removed the Up for grabs label Apr 26, 2022

maflcko reopened this Apr 29, 2022

fanquake closed this Apr 29, 2022

bitcoin locked and limited conversation to collaborators Apr 29, 2023

util: Pass size to ParseHex to assist preallocation #18061

util: Pass size to ParseHex to assist preallocation #18061

Uh oh!

Conversation

elichai commented Feb 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sanjaykdragon commented Feb 3, 2020

Uh oh!

elichai commented Feb 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

promag left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DrahtBot commented Feb 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Conflicts

Uh oh!

elichai commented Feb 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sanjaykdragon commented Feb 4, 2020

Uh oh!

DrahtBot commented Mar 19, 2021

Uh oh!

DrahtBot commented Dec 15, 2021

Uh oh!

DrahtBot commented Mar 21, 2022

Uh oh!

maflcko commented Mar 21, 2022

Uh oh!

maflcko commented Apr 4, 2022

Uh oh!

fanquake commented Apr 26, 2022

Uh oh!

maflcko commented Apr 29, 2022

Uh oh!

fanquake commented Apr 29, 2022

Uh oh!

Uh oh!

elichai commented Feb 3, 2020 •

edited

Loading

elichai commented Feb 3, 2020 •

edited

Loading

promag left a comment •

edited

Loading

DrahtBot commented Feb 3, 2020 •

edited

Loading

elichai commented Feb 4, 2020 •

edited

Loading