-
Notifications
You must be signed in to change notification settings - Fork 37.7k
[bugfix] Fix encoding issue for Windows #13426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can we use |
We still need to use ansi string when we call bdb and leveldb api, so I think that adding a new function is needed. Also, it needs to add |
I'm not really comfortable patching boost to make this work. Firstly because it would mean that only depends-builds can build for Windows, but also because I'm just not familiar with it. Maybe start by upstreaming it and see where the discussion goes? |
Is there any way to build Bitcoin Core for Windows without depends-builds? |
Supposedly there is a way to build with MSVC and Visual Studio which doesn't use the depends system. |
I found a way not to patch boost library, it should work with msvc now. |
Note to reviewers: This pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
@@ -85,6 +85,8 @@ const int64_t nStartupTime = GetTime(); | |||
const char * const BITCOIN_CONF_FILENAME = "bitcoin.conf"; | |||
const char * const BITCOIN_PID_FILENAME = "bitcoind.pid"; | |||
|
|||
fs::detail::utf8_codecvt_facet g_utf8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
g_utf8
should be accessible globally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the only use is in util.h, if you implement those as methods in util.cpp, declared in util.h, you can limit variable access to here util.cpp. I may be wrong though! Reviewing on a cellphone. :P
extern fs::detail::utf8_codecvt_facet g_utf8; | ||
|
||
#define u8string() string(g_utf8) | ||
#define u8path(str) path(str, g_utf8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to do these as methods in the namespace? Preprocessor use should be minimized IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought that pass the second parameter at every .string()
and path()
is pretty annoying and easy to forget. If most developers don't want to use macro function, I could do a scripted-diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean define these macros as functions in the fs namespace that call through as defined here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could include u8path
into fs namespace, however for u8string
we couldn't since it's a member function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll defer to others as to whether this is worthwhile, but absent other options, I would make a function ala u8path
for string as well, and hide g_utf8
Rebased |
src/bitcoin-cli.cpp
Outdated
#ifdef WIN32 | ||
for (std::string& arg : args) | ||
arg = AnsiToUtf8(arg); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to solve this problem, there are three ways to do.
- The first one is above, which might get
?
if the user enter any character that can't present by user's code page. - The second one is using
GetCommandLineW
andCommandLineToArgvW
to get command line options in typewchar_t
, which can present more Unicode characters. - The third one is adding
wmain
to get command line in typewchar_t
directly by "(w)main" function.
Which one is prefered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I choose the second way
@theuni We probably want your thoughts here again. For reference nothing has happened at the upstream PR, however that is less relevant now that we aren't patching Boost. |
If you call |
It would break the |
I'll seperate these into many PRs. |
1661a47 add unicode compatible file_lock for Windows (Chun Kuan Lee) Pull request description: boost::interprocess::file_lock cannot open the files that contain characters which cannot be parsed by the user's code page on Windows. This PR is seperated from #13426 for easier review. Tree-SHA512: e240479cda65958bf6e1319840b83928b2b50da81d99f4f002fb3b62621370bcd4bcfacd2b8c0678c443a650d6ba53d9d12618b591e5bfd67ac14388a18fd822
Fix #13103
Fix #13754
From #13107 (comment), @laanwj suggested that we should keep our internal strings to be always utf-8 encoded.
This PR adds two C++17 compatible macros:
u8string
andu8path
. The reason that I use macros is that developers might not want to pass second parameterutf8
everytime when they callstring
or createpath
objects.I tried to find all
string
methods and convert them tou8string
except for external api calls. Also tried to convert explicit or implicitpath
tou8path
. Tell me if you find something that I missed.Required: bitcoin-core/leveldb-subtree#18
See #13787 for travis run result
Caution: The user must change their config file to be utf-8 encoded if the file contains any non-ascii characters in it. Before 0.18, they are read as ansi-encoded characters on Windows.