Skip to content

Conversation

ken2812221
Copy link
Contributor

@ken2812221 ken2812221 commented Jun 10, 2018

Fix #13103
Fix #13754

From #13107 (comment), @laanwj suggested that we should keep our internal strings to be always utf-8 encoded.

This PR adds two C++17 compatible macros: u8string and u8path. The reason that I use macros is that developers might not want to pass second parameter utf8 everytime when they call string or create path objects.

I tried to find all string methods and convert them to u8string except for external api calls. Also tried to convert explicit or implicit path to u8path. Tell me if you find something that I missed.

Required: bitcoin-core/leveldb-subtree#18

See #13787 for travis run result

Caution: The user must change their config file to be utf-8 encoded if the file contains any non-ascii characters in it. Before 0.18, they are read as ansi-encoded characters on Windows.

@Empact
Copy link
Contributor

Empact commented Jun 10, 2018

@ken2812221
Copy link
Contributor Author

Can we use path.imbued.locale instead for this?

We still need to use ansi string when we call bdb and leveldb api, so I think that adding a new function is needed. Also, it needs to add boost::locale dependency in order to use boost locale generator.

@laanwj laanwj added the Bug label Jun 11, 2018
@ken2812221 ken2812221 changed the title [WIP, bugfix] Add u8path and u8string to boost to fix #13103 [bugfix] Add u8path and u8string to boost to fix #13103 Jun 12, 2018
@fanquake fanquake requested a review from theuni June 12, 2018 13:29
@theuni
Copy link
Member

theuni commented Jun 12, 2018

I'm not really comfortable patching boost to make this work. Firstly because it would mean that only depends-builds can build for Windows, but also because I'm just not familiar with it.

Maybe start by upstreaming it and see where the discussion goes?

@ken2812221
Copy link
Contributor Author

ken2812221 commented Jun 12, 2018

Firstly because it would mean that only depends-builds can build for Windows

Is there any way to build Bitcoin Core for Windows without depends-builds?
Also, I created a PR boostorg/filesystem#75

@achow101
Copy link
Member

Is there any way to build Bitcoin Core for Windows without depends-builds?

Supposedly there is a way to build with MSVC and Visual Studio which doesn't use the depends system.

@ken2812221
Copy link
Contributor Author

I found a way not to patch boost library, it should work with msvc now.

@ken2812221 ken2812221 changed the title [bugfix] Add u8path and u8string to boost to fix #13103 [bugfix] Add u8path and u8string to fix #13103 Jun 13, 2018
@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 14, 2018

Note to reviewers: This pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@@ -85,6 +85,8 @@ const int64_t nStartupTime = GetTime();
const char * const BITCOIN_CONF_FILENAME = "bitcoin.conf";
const char * const BITCOIN_PID_FILENAME = "bitcoind.pid";

fs::detail::utf8_codecvt_facet g_utf8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

g_utf8 should be accessible globally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the only use is in util.h, if you implement those as methods in util.cpp, declared in util.h, you can limit variable access to here util.cpp. I may be wrong though! Reviewing on a cellphone. :P

extern fs::detail::utf8_codecvt_facet g_utf8;

#define u8string() string(g_utf8)
#define u8path(str) path(str, g_utf8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to do these as methods in the namespace? Preprocessor use should be minimized IMO.

Copy link
Contributor Author

@ken2812221 ken2812221 Jun 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought that pass the second parameter at every .string() and path() is pretty annoying and easy to forget. If most developers don't want to use macro function, I could do a scripted-diff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean define these macros as functions in the fs namespace that call through as defined here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could include u8path into fs namespace, however for u8string we couldn't since it's a member function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to others as to whether this is worthwhile, but absent other options, I would make a function ala u8path for string as well, and hide g_utf8

@ken2812221
Copy link
Contributor Author

Rebased

#ifdef WIN32
for (std::string& arg : args)
arg = AnsiToUtf8(arg);
#endif
Copy link
Contributor Author

@ken2812221 ken2812221 Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to solve this problem, there are three ways to do.

  • The first one is above, which might get ? if the user enter any character that can't present by user's code page.
  • The second one is using GetCommandLineW and CommandLineToArgvW to get command line options in type wchar_t, which can present more Unicode characters.
  • The third one is adding wmain to get command line in type wchar_t directly by "(w)main" function.

Which one is prefered?

Copy link
Contributor Author

@ken2812221 ken2812221 Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I choose the second way

@ken2812221 ken2812221 changed the title [bugfix] Add u8path and u8string to fix #13103 [bugfix] Add u8path and u8string to fix encoding issue for Windows Jul 17, 2018
@laanwj laanwj added this to the 0.17.0 milestone Jul 19, 2018
@fanquake
Copy link
Member

@theuni We probably want your thoughts here again.

For reference nothing has happened at the upstream PR, however that is less relevant now that we aren't patching Boost.

@Empact
Copy link
Contributor

Empact commented Aug 3, 2018

If you call CommandLineToArgvW within gArgs.ParseParameters instead, you can avoid touching src/bitcoind.cpp and src/bench/bench_bitcoin.cpp, at least. There may be other opportunities for minimization.

@ken2812221
Copy link
Contributor Author

ken2812221 commented Aug 3, 2018

If you call CommandLineToArgvW within gArgs.ParseParameters

It would break the ParseParameters in tests if we do that.

@ken2812221
Copy link
Contributor Author

I'll seperate these into many PRs.

@ken2812221 ken2812221 closed this Aug 5, 2018
@ken2812221 ken2812221 deleted the u8path_u8string branch August 6, 2018 02:49
laanwj added a commit that referenced this pull request Aug 29, 2018
1661a47 add unicode compatible file_lock for Windows (Chun Kuan Lee)

Pull request description:

  boost::interprocess::file_lock cannot open the files that contain characters which cannot be parsed by the user's code page on Windows.

  This PR is seperated from #13426 for easier review.

Tree-SHA512: e240479cda65958bf6e1319840b83928b2b50da81d99f4f002fb3b62621370bcd4bcfacd2b8c0678c443a650d6ba53d9d12618b591e5bfd67ac14388a18fd822
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Windows crashes for -wallet=你好 Invalid wallet path with Chinese characters in windows
10 participants