-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Remove the boost/algorithm/string/case_conv.hpp dependency #13671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the boost/algorithm/string/case_conv.hpp dependency #13671
Conversation
c125aee
to
e5085d3
Compare
Concept ACK modulo …
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to include <algorithm>
/<ctype.h>
where appropriate.
src/rpc/server.cpp
Outdated
std::string firstLetter = category.substr(0,1); | ||
boost::to_upper(firstLetter); | ||
strRet += "== " + firstLetter + category.substr(1) + " ==\n"; | ||
unsigned char firstLetter = std::toupper(static_cast<unsigned char>(category[0])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::toupper(category.front())
should work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Empact, I have addressed your feedback. The code to capitalize the first letter of a string has been isolated in void Capitalize(std::string& str)
in utilstrencodings.cpp
.
src/netbase.cpp
Outdated
@@ -38,7 +37,7 @@ static const int SOCKS5_RECV_TIMEOUT = 20 * 1000; | |||
static std::atomic<bool> interruptSocks5Recv(false); | |||
|
|||
enum Network ParseNetwork(std::string net) { | |||
boost::to_lower(net); | |||
std::transform(net.begin(), net.end(), net.begin(), [](unsigned char c) -> unsigned char {return std::tolower(c);}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can pass tolower
directly like so: std::transform(net.begin(), net.end(), net.begin(), tolower);
The same doesn't work with std::tolower
fyi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could cast like so, I believe that has the same effect: std::transform(net.begin(), net.end(), net.begin(), (int (*)(int)) std::tolower);
Note tolower accepts an int
but handles it as an unsigned char
: https://en.cppreference.com/w/c/string/byte/tolower
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Empact!
I opted for the lambda expression, because it allowed me to pass unsigned char
characters to the tolower
function.
I understood from the referenced documentation that it doesn't handle the int
as an unsigned char
(character to be converted. If the value of ch is not representable as unsigned char and does not equal EOF, the behavior is undefined.).
I hope you like the new implementation better.
Concept ACK |
|
Note to reviewers: This pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Many thanks for the review comments!
As used in this project at the moment both
bitcoin/src/rpc/blockchain.cpp Lines 1923 to 1929 in 171b03d
In this particular case I have also raised the suggestion to drop if (net == "ipv4" || net == "IPv4") return NET_IPV4;
if (net == "ipv6" || net == "IPv6") return NET_IPV6; I would even prefer this alternative if the modified external behaviour is acceptable.
I had definitely considered this but felt that it would be a potential rabbit hole without guidance. Implementing utility functions Is this good enough for our purposes? If the objective is to write custom functions that accept Either way, I am happy to give this another shot by addressing @Empact 's comments; or pursuing alternatives proposed by @practicalswift and @laanwj if I can get further guidance in which direction to take this. |
@251labs For say the custom locale independent
More generally: as long as your custom functions return the same value as the standard library equivalents would have done under the default "C" locale everything is fine. |
afd6b5e
to
4388ad5
Compare
Thanks again for the feedback! Per the review comments, I have implemented custom I have tried to make the |
src/test/util_tests.cpp
Outdated
@@ -1202,4 +1202,63 @@ BOOST_AUTO_TEST_CASE(test_DirIsWritable) | |||
fs::remove(tmpdirname); | |||
} | |||
|
|||
BOOST_AUTO_TEST_CASE(test_ToLower) | |||
{ | |||
BOOST_CHECK(ToLower('@') == '@'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use BOOST_CHECK_EQUAL(ToLower('@'), '@')
(it gives a clearer diagnostic message in case of failure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, addressed in 0234af5
src/utilstrencodings.h
Outdated
*/ | ||
inline unsigned char ToLower(unsigned char c) | ||
{ | ||
static const unsigned char mappings[256] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lot of code!
What about return (c >= 'A' && c <= 'Z' ? (c - 'A') + 'a' : c);
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sipa,
I asked myself the same question and I am happy to change it!
Just to share why I decided to use a mappings array:
Support for extended ASCII
Should there be a time where it becomes important to have support for characters in the extended ASCII range then it will be trivial to implement it. It will then just be a matter of adjusting entries in mappings array vs extending the implementation with control statements like if (c == 'Ü') return 'ü';
or alternative logic.
Performance
I wanted our locale independent functions to be a good citizen in terms of performance for code that requires it.
I did some naive tests to get an impression what the code would compile to and it seemed that the compiler was able to generate more efficient code in the case return mappings[c];
At all optimization levels I would get a pointer to the mappings array where mappings are performed on:
lea rax, [rip + __ZZ7ToLowerhE15tolowermappings]
movzx eax, byte ptr [rdi + rax]
Alternatively in the case of (c >= 'A' && c <= 'Z' ? (c - 'A') + 'a' : c)
the compiler would generate something like this at O3
optimization:
mov eax, edi
add al, -65
cmp al, 26
jae LBB3_2
add dil, 32
LBB3_2:
movzx eax, dil
Particularly in the case of iterations I felt the compiler was able to do better optimizations, for example minimizing iterations by operating in sequences. In the case of the former it could do 4 mapping operations per sequence; while in the latter of at most two mappings per iteration.
On this basis of this I assumed the code would also perform better, but I realize I can be very wrong because I am no expert at this.
I hope this feedback is useful. It it isn't, I will gladly change the code!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly prefer @sipa:s more readable shorter version. I'd bet any potential performance difference is insignificant in the grand scheme of things :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @practicalswift, will update the code per @sipa's feedback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @sipa. If we need to add extended ascii support, we can address that then, and this is not a performance-critical operation that calls for extra code to optimize it. Simple, repetitive code adds visual noise and can hide bugs as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Empact, I will update the pull request!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for the feedback. I have updated the pull request and addressed your comments in 53b3bfa.
src/utilstrencodings.h
Outdated
* characters in the standard 7-bit ASCII range. | ||
* @param[in,out] str the string to convert to uppercase. | ||
*/ | ||
void ToUpper(std::string& str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it's slightly confusing to have a version of ToUpper
that operates on chars (and does not modify the argument) while the version operating on vectors does modify. Perhaps call this ConvertToUpper
or so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, addressed in 0234af5
4388ad5
to
f296a93
Compare
src/rpc/server.cpp
Outdated
strRet += "== " + firstLetter + category.substr(1) + " ==\n"; | ||
std::string heading = category; | ||
Capitalize(heading); | ||
strRet += "== " + heading + " ==\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would return the string from Capitalize
and inline that into the string construction.
src/utilstrencodings.cpp
Outdated
std::transform(str.begin(), str.end(), str.begin(), [](unsigned char c){return ToLower(c);}); | ||
} | ||
|
||
void ConvertToUpper(std::string& str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mu-nit: Ruby uses the terms Downcase
and Upcase
for these operations, which I like. Just a suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should define ConvertToUpper
if we're not using it. The construction can be trivially re-created from ConvertToLower
if we need it, and in the meantime it's just dead code.
src/utilstrencodings.h
Outdated
* @return the uppercase equivalent of c; or the argument | ||
* if no conversion is possible. | ||
*/ | ||
inline unsigned char ToUpper(unsigned char c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: constexpr
src/utilstrencodings.h
Outdated
* @return the lowercase equivalent of c; or the argument | ||
* if no conversion is possible. | ||
*/ | ||
inline unsigned char ToLower(unsigned char c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: constexpr
f296a93
to
400dadf
Compare
BOOST_CHECK_EQUAL(ToLower('Z'), 'z'); | ||
BOOST_CHECK_EQUAL(ToLower('['), '['); | ||
BOOST_CHECK_EQUAL(ToLower(0), 0); | ||
BOOST_CHECK_EQUAL(ToLower(255), 255); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some ToUpper(...)
tests as well :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @practicalswift! I was a bit too eager when I removed ToUpper(std::string&)
and its tests (facepalm). Fixed in dff017f.
400dadf
to
76d70e5
Compare
std::string firstLetter = category.substr(0,1); | ||
boost::to_upper(firstLetter); | ||
strRet += "== " + firstLetter + category.substr(1) + " ==\n"; | ||
strRet += "== " + Capitalize(category) + " ==\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this modifies the category
in-place, I think this breaks the comparison logic, e.g. on line 189.
Sorry, I think my original nit was misguided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize(…)
(std::string Capitalize(std::string str)
) does not modify category
in-place :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Empact, we should be good. When I addressed your feedback, I did change the method signature from void Capitalize(std::string&)
to std::string Capitalize(std::string)
, which passes the argument by value (category
is copied into str
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I misread - agree the argument is passed by value, so it's fine. 👍
utACK 76d70e5fb1923cdbc4771f438f10ae0039beccaa |
76d70e5
to
aa32fe3
Compare
This commit implements a unit test that validates the `ParseNetwork(std::string)` implementation in `netbase.cpp`.
This commit implements custom equivalents for the C and C++ `tolower` and `toupper` Standard Library functions. In addition it implements a utility function to capitalize the first letter of a string.
This commit removes the `boost/algorithm/string/case_conv.hpp` dependency from the project. It replaces the `boost::to_lower` and `boost::to_upper` functions with custom functions that are locale independent and ASCII deterministic.
aa32fe3
to
b193d5a
Compare
BOOST_CHECK_EQUAL(ParseNetwork("TOR"), NET_ONION); | ||
|
||
BOOST_CHECK_EQUAL(ParseNetwork(":)"), NET_UNROUTABLE); | ||
BOOST_CHECK_EQUAL(ParseNetwork("tÖr"), NET_UNROUTABLE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, tÖr, that's like the Swedish version of Tor right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol @laanwj, best review comment ever! I do hidden messages or easter eggs from time to time, but this isn't one of them 😂😂
utACK b193d5a |
…dependency b193d5a Removes the Boost case_conv.hpp dependency. (251) 7a208d9 Implements custom tolower and toupper functions. (251) e2ba043 Implements ParseNetwork unit test. (251) Pull request description: This pull request removes the `boost/algorithm/string/case_conv.hpp` dependency from the project. `boost/algorithm/string/case_conv.hpp` is included for the `boost::to_lower` and `boost::to_upper` template functions. We can replace the calls to these functions with straightforward alternative implementations that use the C++ Standard Library, because the functions are called with `std::string` objects that use standard 7-bit ASCII characters as argument. The refactored implementation should work without the explicit `static_cast<unsigned char>` cast and `unsigned char` lambda return type. Both have been added defensively and to be explicit. Especially in case of the former, behaviour is undefined (potentially result in a crash) if the `std::toupper` argument is not an `unsigned char`. A potential alternative, maybe even preferred, implementation to address the `boost::to_lower` function call in `ParseNetwork(std::string)` could have been: ```c++ if (net == "ipv4" || net == "IPv4") return NET_IPV4; if (net == "ipv6" || net == "IPv6") return NET_IPV6; ``` This alternative implementation would however change the external behaviour of `ParseNetwork(std::string)`. This pull requests includes a unit test to validate the implementation of `ParseNetwork(std::string)` prior and after the removal of the `case_conv.hpp` dependency. `boost/algorithm/string/case_conv.hpp` has been removed from the `EXPECTED_BOOST_INCLUDES` in `test/lint/lint-includes.sh` because it is no longer required. Tree-SHA512: d803ae709f2368a3efb223097384a722436955bce0c44a1a5cffd0abb3164be0cce85ba0e9ebd9408166df3f1a95ea0c0d29e3a2534af2fae206c0419d67fde9 # Conflicts: # src/netbase.cpp # src/test/util_tests.cpp # src/util/strencodings.h # test/lint/lint-locale-dependence.sh
merge bitcoin#13671, bitcoin#17405, bitcoin#18786, bitcoin#18792, bitcoin#20067: Deboostification
This pull request removes the
boost/algorithm/string/case_conv.hpp
dependency from the project.boost/algorithm/string/case_conv.hpp
is included for theboost::to_lower
andboost::to_upper
template functions.We can replace the calls to these functions with straightforward alternative implementations that use the C++ Standard Library, because the functions are called with
std::string
objects that use standard 7-bit ASCII characters as argument.The refactored implementation should work without the explicit
static_cast<unsigned char>
cast andunsigned char
lambda return type. Both have been added defensively and to be explicit. Especially in case of the former, behaviour is undefined (potentially result in a crash) if thestd::toupper
argument is not anunsigned char
.A potential alternative, maybe even preferred, implementation to address the
boost::to_lower
function call inParseNetwork(std::string)
could have been:This alternative implementation would however change the external behaviour of
ParseNetwork(std::string)
.This pull requests includes a unit test to validate the implementation of
ParseNetwork(std::string)
prior and after the removal of thecase_conv.hpp
dependency.boost/algorithm/string/case_conv.hpp
has been removed from theEXPECTED_BOOST_INCLUDES
intest/lint/lint-includes.sh
because it is no longer required.