-
Notifications
You must be signed in to change notification settings - Fork 37.7k
init: Clarify C and C++ locale assumptions. Add locale sanity checks to verify assumptions at run-time. #18124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…d C and C++ locales are set to the classic "C" locale (the minimal locale)
4fa3195
to
9af63aa
Compare
…tcoin-qt C++ locale is set to the classic "C" locale (the minimal locale)
9af63aa
to
e489fb9
Compare
I'm not sure whether it's better to crash than start with a "wrong" locale. We're already avoiding using locale-dependent functions in pretty much all important places. Edit: also, why are our assumptions different for bitcoin-qt and bitcoind? |
@laanwj I think you misunderstand the patch. Note that currently there is no way for a user to make This is just a sanity check that makes sure that the assumption we know to be true today remain true also in the future. Like |
This is just my personal opinion but I think this adds some quite ugly code into our already convoluted initialization process, using weird functions such as If we decide to do this, I'd prefer to add the common checks to the other sanity checks and not distinguish between |
@laanwj How would you suggest checking for the assumed locales in a less ugly way? How do you suggest reading the C locale without using |
But we're making different assumptions for
As the code is written we apparently want the C locale to be user-defined in |
Perhaps I should have included a few examples in the PR title. The gotcha list could be made very long, but this is a C locale gotcha example :)
Perhaps surprisingly |
And a C++ locale gotcha:
|
More C locale WTF:
Let me know when I've made my case sufficiently clear :) |
@practicalswift You're making the case that locales are annoying, but we already knew that. I think what @laanwj means is that he'd rather just avoid any locale-dependent functionality in the code (to make the locale settings irrelevant) as opposed to forcing the locale to a known value (correct me if I'm wrong of course). |
-0 I agree with @laanwj let's remove locale dependence rather than reduce the set of configurations we can run against. |
Note that this PR simply documents and asserts restrictions we already have in place. More specifically it does not disallow any configuration that is currently allowed. (Please correct me if you think I'm wrong.) I've worked a lot on removing the use of locale-dependent functions and I fail to see how that effort would be in conflict with documenting and asserting the currently made locale assumptions. It is my understanding the following are the only possible effective locale combinations before and after this PR:
Is that your understanding as well? Perhaps the suggestion is that we might want to change the current locale logic and allow for Again, please correct me if my reasoning is wrong. I could be missing something :) |
Recommended reading to better understand the motivation behind this PR: "Differences between the C Locale and the C++ Locales" |
No, what I'm trying to say is simply that if we don't have locale-dependent functions, this should be unnecessary complexity. |
It is my understanding the following are the only possible effective locale combinations before and after this PR:
Is that your understanding as well? |
I don't understand what you mean by "effective locale". The locale shouldn't matter, so for all we care they're both always effectively equal to "foobar". |
@sipa Sorry for my sloppy wording: with "effective locale" I simply meant the current locale in use (the C locale and the global C++ locale). More specifically the following:
I claim the following is guaranteed to hold true no matter user configuration:
To make it even more explicit - it is my understanding that the following assertions are guaranteed to hold for the entire lifetime of the processes regardless of user configuration and/or what the user has in his/hers For
For
Is that your understanding as well? :) |
Before this PR, those assertions obviously don't hold? I'm very confused. |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Then I'm confused by your confusion :) What assertion are you claiming not to hold before this PR? The Very interesting: it seems like one of us will walk away from here with a "TIL moment" :) Very exciting regardless of outcome! :) FWIW: $ git diff
diff --git a/src/logging.h b/src/logging.h
index b2fde1b9e..6b3bc5032 100644
--- a/src/logging.h
+++ b/src/logging.h
@@ -10,9 +10,12 @@
#include <tinyformat.h>
#include <atomic>
+#include <clocale>
#include <cstdint>
#include <list>
+#include <locale>
#include <mutex>
+#include <sstream>
#include <string>
#include <vector>
@@ -162,6 +165,15 @@ bool GetLogCategory(BCLog::LogFlags& flag, const std::string& str);
template <typename... Args>
static inline void LogPrintf(const char* fmt, const Args&... args)
{
+ std::ostringstream oss1;
+ oss1 << 1000000;
+ assert(oss1.str() == "1000000");
+ assert(std::to_string(1.23) == "1.230000");
+
+ assert(std::locale{}.name() == "C");
+ std::string current_locale = {setlocale(LC_ALL, nullptr)};
+ assert(current_locale == "C");
+
if (LogInstance().Enabled()) {
std::string log_msg;
try {
$ make
$ LC_ALL=de_DE src/bitcoind
… note the absence of assertion failures … |
|
As you probably know the equivalent of I think you're incorrectly assuming that we are doing Could that be the case? |
No, I'm not assuming any |
Sounds like you're incorrectly assuming that I claim that If you still don't think that is the case then I kindly ask you to provide a counter-example in the form of a configuration and/or diff. I would love to be proved wrong and walk away with a TIL experience :) FWIW:
|
Aha, I was indeed making the assumption that In that case, I have another concern: doesn't this break running |
It sounds like you're incorrectly assuming that any of the As you probably know the equivalent of Thus when the execution of Without any explicit call to Let's demonstrate this with a small test program: #include <clocale>
#include <cstdlib>
#include <iostream>
#include <locale>
#include <sstream>
#include <string>
int main(void)
{
std::cout << "C locale at beginning of main: " << std::string{setlocale(LC_ALL, nullptr)} << "\n";
std::cout << "C++ locale at beginning of main: " << std::locale{}.name() << "\n\n";
#if OPT_IN_TO_LOCALIZATION_USING_SET_LOCALE
setlocale(LC_ALL, "");
#endif
#if OPT_IN_TO_LOCALIZATION_USING_STD_LOCALE_GLOBAL
std::locale::global(std::locale(""));
#endif
std::ostringstream oss;
oss << 1000000;
std::cout << "std::ostringstream oss; oss << 1000000; oss.str() equals " << oss.str() << "\n";
std::cout << "std::to_string(1.23) equals " << std::to_string(1.23) << "\n\n";
std::cout << "C locale at end of main: " << std::string{setlocale(LC_ALL, nullptr)} << "\n";
std::cout << "C++ locale at end of main: " << std::locale{}.name() << "\n\n";
std::cout << "setlocale(LC_COLLATE, nullptr) (read-only operation) = " << std::string{setlocale(LC_COLLATE, nullptr)} << "\n";
std::cout << "setlocale(LC_CTYPE, nullptr) (read-only operation) = " << std::string{setlocale(LC_CTYPE, nullptr)} << "\n";
std::cout << "setlocale(LC_MESSAGES, nullptr) (read-only operation) = " << std::string{setlocale(LC_MESSAGES, nullptr)} << "\n";
std::cout << "setlocale(LC_MONETARY, nullptr) (read-only operation) = " << std::string{setlocale(LC_MONETARY, nullptr)} << "\n";
std::cout << "setlocale(LC_NUMERIC, nullptr) (read-only operation) = " << std::string{setlocale(LC_NUMERIC, nullptr)} << "\n";
std::cout << "setlocale(LC_TIME, nullptr) (read-only operation) = " << std::string{setlocale(LC_TIME, nullptr)} << "\n";
} Let's try with
Let's try with
Let's try
Let's try
QED? :) |
Oh, I'm indeed learning. I assumed that the LC_ environment variables automatically affect all programs. So what is this testing? It seems to verify things that are guaranteed by the language? |
@sipa The only thing guaranteed by the language is that a C++ program has the C and C++ locales set to the classic locale when entering There are quite a few gotchas that are worth guarding against in my opinion: A common gotcha is that Another common gotcha is libraries calling I think we have that problem today in the form of Qt which runs |
To make things even more unpredictable some versions of Qt messes with all locale categories whereas other versions only messes with a subset of locale categories. I believe a major locale handling change was made in Qt 4.4 for example. |
Some of our stuff, like |
We produce the following five binaries:
Which of them should be localized and which should not be localized? |
@luke-jr Could you clarify your comment by answering the last question? :) |
@practicalswift Purely user-facing graphical displays should be localized |
@MarcoFalke |
I think it's wrong to think about binaries being localized or not. The user-interfacing code in |
I think our statements can be combined to a meaningful policy:
Or alternatively:
Sounds good? :) |
…nce of the strencodings.h functions 259e290 tests: Add fuzzing harness for locale independence testing (practicalswift) Pull request description: Context: [C and C++ locale assumptions in bitcoind and bitcoin-qt](#18124) Add fuzzing harness for locale independence testing of functions in `strencodings.h` and `tinyformat.h`. Test this PR using: ``` $ make distclean $ ./autogen.sh $ CC=clang CXX=clang++ ./configure --enable-fuzz \ --with-sanitizers=address,fuzzer,undefined $ make $ src/test/fuzz/locale … ``` The tested functions (`ParseInt32(…)`, `ParseInt64(…)`, `atoi(const std::string&)`, `atoi64(const std::string& str)`, `i64tostr(const char*)`, `itostr(…)`, `strprintf(…)`) all call locale dependent functions (such as `strtol(…)`, `strtoll(…)`, `atoi(const char*)`, etc.) but are assumed to do so in a way that the tested functions return same results regardless of the chosen C locale (`setlocale`). This fuzzer aims to test that those assumptions hold up also in practice now and over time. Top commit has no ACKs. Tree-SHA512: d108d2f85aa6f482839dafbc7579465ffd4bacf7bc52835ad0fbaa1c71aed9b3870c83447b3d453a03b9ce307e76a3cfdd350a0c77024ab094c93c7d62c8a527
@practicalswift That sounds right; I don't think any of that is related to the changes in this PR. |
Closing this PR: consensus opinion seems to be that the locale assumptions we're are making implicitly are not worth asserting explicitly :) |
Clarify locale assumptions.
Add locale sanity check making sure the global
bitcoind
C and C++ locales are set to the classic"C"
locale (the minimal locale).Add locale sanity check making sure the global
bitcoin-qt
C++ locale is set to the classic"C"
locale (the minimal locale). (We do not assume any specific C locale inbitcoin-qt
.)To summarize - assumed locales:
setlocale
)std::locale
)bitcoind
C
(classic)C
(classic)bitcoin-qt
C
(classic)