-
Notifications
You must be signed in to change notification settings - Fork 37.7k
optimization: Optimize IsSpace function for common non-whitespace characters #29602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. |
c15b5e2
to
210a280
Compare
Is this used in a hot path that is relevant, to warrant the changes? |
It's used in base 58 and 16 conversions while iterating over each character of the input. |
Yes, but base58 isn't used in a hot path, where optimizing it would result in a noticeable difference. (When sending an address over RPC, the remaining overhead is too large to notice a speedup of a few nanoseconds) |
It's a small improvement, indeed, but it eliminates useless work, done frequently. |
Is this really a bottleneck? A benchmark would be useful to see if this isn't something the compiler already does. ACK on the test. |
Thanks for the review @laanwj. The gain is small, but measurable:
which results in a speedup of ~2% for 130 non-whitespace hexadecimal characters:
|
210a280
to
3a47a61
Compare
3a47a61
to
e4e4c36
Compare
e0435d3
to
68829a4
Compare
…coding 07f6417 Reduce memory copying operations in bech32 encode (Lőrinc) d5ece3c Reserve hrp memory in Decode and LocateErrors (Lőrinc) Pull request description: Started optimizing the base conversions in [TryParseHex](#29458), [Base58](#29473) and [IsSpace](#29602) - this is the next step. Part of this change was already merged in #30047, which made decoding `~26%` faster. Here I've reduced the memory reallocations and copying operations in bech32 encode, making it `~15%` faster. > make && ./src/bench/bench_bitcoin --filter='Bech32Encode' --min-time=1000 Before: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 19.97 | 50,074,562.72 | 0.1% | 1.06 | `Bech32Encode` ``` After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 17.33 | 57,687,668.20 | 0.1% | 1.10 | `Bech32Encode` ``` ACKs for top commit: josibake: ACK 07f6417 sipa: utACK 07f6417 achow101: ACK 07f6417 Tree-SHA512: 511885217d044ad7ef2bdf9203b8e0b94eec8b279bc193bb7e63e29ab868df6d21e9e4c7a24390358e1f9c131447ee42039df72edcf1e2b11e1856eb2b3e10dd
The IsSpace function has been optimized for the more common case where a character is not whitespace. Previously, the function checked each whitespace character individually, which was not efficient for non-whitespace inputs as it required multiple comparisons. The updated IsSpace function now first checks if the character is less than or equal to ' ' (the space character, which has the highest ASCII value among the whitespace characters). This single condition can quickly determine that most characters (those with ASCII values greater than ' ') are not whitespace, thus short-circuiting the evaluation for the most common case. If the character is less than or equal to ' ', the function performs additional checks to see if it is either a space character or within the range of horizontal tab to carriage return ('\t' to '\r'), which are the remaining whitespace characters in the ASCII table. This change assumes that most calls to IsSpace are for non-whitespace characters, as can be inferred from the usage patterns where IsSpace is often used in loops that parse strings until a non-whitespace character is encountered.
68829a4
to
f11c063
Compare
Github-Pull: bitcoin#29602 Rebased-From: 783e839
The IsSpace function has been optimized for the more common case where a character is not whitespace. Previously, the function checked each whitespace character individually, which was not efficient for non-whitespace inputs as it required multiple comparisons. The updated IsSpace function now first checks if the character is less than or equal to ' ' (the space character, which has the highest ASCII value among the whitespace characters). This single condition can quickly determine that most characters (those with ASCII values greater than ' ') are not whitespace, thus short-circuiting the evaluation for the most common case. If the character is less than or equal to ' ', the function performs additional checks to see if it is either a space character or within the range of horizontal tab to carriage return ('\t' to '\r'), which are the remaining whitespace characters in the ASCII table. This change assumes that most calls to IsSpace are for non-whitespace characters, as can be inferred from the usage patterns where IsSpace is often used in loops that parse strings until a non-whitespace character is encountered. Github-Pull: bitcoin#29602 Rebased-From: f11c063
The
IsSpace
function has been optimized for the more common case where a character is not whitespace. Previously, the function checked each whitespace character individually, which was not efficient for non-whitespace inputs as it required multiple comparisons.This method is often used for parsing various inputs more flexibly, see usages.
This results in an additional ~2% performance improvement for base conversions (see related hexadecimal and base58 optimizations).
The updated
IsSpace
function now first checks if the character is less than or equal to ' ' (the space character, which has the highest ASCII value among the whitespace characters). This single condition can quickly determine that most characters (i.e. letters, numbers) are not whitespace, thus short-circuiting the evaluation for the most common case.Otherwise the function performs additional checks to see if it is either a space character or within the range of horizontal tab to carriage return ('\t' to '\r'), which are the remaining whitespace characters in the ASCII table.
This change assumes that most calls to IsSpace are for non-whitespace characters, as can be inferred from the usage patterns where
IsSpace
is often used in loops that parse strings until a non-whitespace character is encountered.To make sure the changes keep the previous functionality, I've added an exhaustive unit test for it.