-
Notifications
You must be signed in to change notification settings - Fork 37.7k
utils: Add fstream wrapper to allow to pass unicode filename on Windows #13878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
but there's override: #include <fstream>
int main(int argc, char* argv[])
{
std::fstream tr(L"hk");
return 0;
} compiles fine.
|
This is for mingw |
|
@alexeyneu To be clear, this is for the mingw on Ubuntu 18.04 which is what we use for gitian to build release binaries. If you can test this on Ubuntu, that would be great. |
msys2 |
Which is to say, i dont see much sense in testing it coz it will work . |
I don't think I understand how this PR works (does it depend on #13866?), or why reimplementing fstream classes is a good solution compared to alternatives. If this is a bug in boost, wouldn't it be better to fix the bug upstream and maybe patch it locally? Or could we drop the use of boost here and instead use It would be helpful to have a clear description of the bug and possible workarounds. |
src/fs.h
Outdated
@@ -38,6 +42,41 @@ namespace fsbridge { | |||
void* hFile = (void*)-1; // INVALID_HANDLE_VALUE | |||
#endif | |||
}; | |||
|
|||
|
|||
#if defined WIN32 && defined __GNUC__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be checking for __GLIBCXX__
instead of __GNUC__
? It looks like __gnu_cxx::stdio_filebuf
might be present in libstdc++ but not libc++ (https://stackoverflow.com/questions/22624503/how-to-get-a-file-descriptor-from-a-stdbasic-ios-for-clang-on-os-x).
Will different workarounds be needed for MSVC and clang?
If building with libstdc++, we can use its extension stdio_filebuf by passing
I don't know about clang for Windows. But for MSVC, we can use
Without #13866, it would work. But to solve the encoding issue, it requires #13866. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fclose(file); | ||
} | ||
file = nullptr; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to add a static assert here to detect if this problem happens with other libraries or versions of libraries:
#if defined WIN32 && defined __GLIBCXX__
...
#elif WIN32
static_assert(sizeof(*fs::path().BOOST_FILESYSTEM_C_STR) == sizeof(wchar_t),
"Warning: This build is using boost::filesystem ofstream and ifstream "
"implementations which will fail to open paths containing multibyte "
"characters. You should delete this static_assert to ignore this warning, "
"or switch to a different C++ standard library like the Microsoft C++ "
"Standard Library (where boost uses non-standard extensions to construct "
"stream objects with wide filenames), or the GNU libstdc++ library (where "
"a more complicated workaround has been implemented above).");
#endif
src/fs.cpp
Outdated
/** | ||
* reference: https://github.com/gcc-mirror/gcc/blob/571ee70a6d4226145f77b885937a8eda678e0fb2/libstdc%2B%2B-v3/include/std/fstream#L283 | ||
* | ||
* +---------------------------------------------------------+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would drop this table but keep the URL link, since the table just duplicates information provided by the code below, and seems actually less readable than the code.
For reference, https://en.cppreference.com/w/cpp/io/basic_filebuf/open also has a similar table.
src/fs.cpp
Outdated
void ifstream::open(const fs::path& p, std::ios_base::openmode mode) | ||
{ | ||
close(); | ||
file = fopen(p, openmodeToStr(mode).c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's obvious that this is intended to call the fsbridge::fopen
function above rather than ::fopen
. Would suggesting adding a comment here or writing the namespace.
Could also do the same in ofstream::open
below.
src/fs.cpp
Outdated
if (file == nullptr) { | ||
return; | ||
} | ||
filebuf = std::move(__gnu_cxx::stdio_filebuf<char>(file, mode)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::move
can be dropped here. Argument is already an rvalue so it has no effect.
Same applies in ofstream::open
below.
src/fs.h
Outdated
void close(); | ||
|
||
private: | ||
__gnu_cxx::stdio_filebuf<char> filebuf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you prefix these members with m_
to match recommended style?
@ryanofsky requested change included and unit tests added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 6b6c598ab55c60ed5050c0ba0a817dba899384c0. Changes since previous review were various suggestions like adding static_assert
, m_
prefixes, comments, and new tests. Thanks especially for the new tests.
src/fs.h
Outdated
// ifstream/ofstream `wchar_t` constructors, and the GNU library doesn't | ||
// provide them (in contrast to the Microsoft C++ library, see | ||
// https://stackoverflow.com/questions/821873/how-to-open-an-stdfstream-ofstream-or-ifstream-with-a-unicode-filename/822032#822032), | ||
// Boost is forced to fall back to `char` APIs which may not work properly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "utils: Add fsbridge fstream function wrapper" (0548f48f19d33cb57b8279d7a647e91d375a916e)
I know this is my own comment, maybe replace "APIs" with "constructors" on this to make clearer this is referring to the same constructors previously mentioned above.
|
||
BOOST_FIXTURE_TEST_SUITE(fs_tests, BasicTestingSetup) | ||
|
||
BOOST_AUTO_TEST_CASE(fsbridge_fstream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "tests: Add test case for std::ios_base::ate" (6b6c598ab55c60ed5050c0ba0a817dba899384c0)
Would it be possible to add a test accessing a utf8 filesystem path now that #13866 is merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. But this seems only work if #13877 merged, so change this PR to based on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK b11cf20f8255bf20fc3c80117308fcf77424545a. Only changes since last review were tweaking a comment and updating the tests to use utf8 paths for better coverage. Making the test work required #13877, so that PR is now included here (and hopefully can be merged first).
src/fs.h
Outdated
// GNU libstdc++ specific workaround for opening UTF-8 paths on Windows. | ||
// | ||
// On Windows, it is only possible to reliably access multibyte file paths through | ||
// `wchar_t` constructors, not `char` APIs. But because the C++ standard doesn't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Word "constructors" on this line should be changed back to "APIs", because this is just referring in general to how multibyte paths need to be accessed on windows, not to stream object constructors in particular.
src/fs.h
Outdated
{ | ||
public: | ||
ifstream() = default; | ||
ifstream(const fs::path& p, std::ios_base::openmode mode = std::ios_base::in) { open(p, mode); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2018-09-22 21:43:16 cpplint(pr=13878): src/fs.h:65: Constructors callable with one argument should be marked explicit. [runtime/explicit] [5]
src/fs.h
Outdated
{ | ||
public: | ||
ofstream() = default; | ||
ofstream(const fs::path& p, std::ios_base::openmode mode = std::ios_base::out) { open(p, mode); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2018-09-22 21:43:16 cpplint(pr=13878): src/fs.h:79: Constructors callable with one argument should be marked explicit. [runtime/explicit] [5]
Rebased and fix all nits from the comment |
<ClCompile> | ||
<AdditionalOptions>/utf-8 %(AdditionalOptions)</AdditionalOptions> | ||
</ClCompile> | ||
</ItemDefinitionGroup> | ||
</Project> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing newline at end of file :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems unnecessary for MSVC project file, they can work without the extra newline. But you can add it in your PR. I don't have strong opinion.
utACK 86eb3b3 |
…ename on Windows 43c7fbb Make MSVC compiler read the source code using utf-8 (Chun Kuan Lee) f86a571 tests: Add test case for std::ios_base::ate (Chun Kuan Lee) a554cc9 Move boost/std fstream to fsbridge (Chun Kuan Lee) 86eb3b3 utils: Add fsbridge fstream function wrapper (Chun Kuan Lee) Pull request description: If compiled with mingw, use glibc++ extension `stdio_filebuf` to open the file by `FILE*` instead of filename. In other condition, we can use boost::fstream. Tree-SHA512: b5dbd83e347fb9b2a0c8b1c2c7bd71a272e839ec0617883b2a0ec12506ae9e825373cf6e95b9bcc91d7edc85bf51580a7716b56a9ecaad776bc3ae61638cb3da
Summary: ``` If compiled with mingw, use glibc++ extension stdio_filebuf to open the file by FILE* instead of filename. In other condition, we can use boost::fstream. ``` Backport of core [[bitcoin/bitcoin#13878 | PR13878]] and [[bitcoin/bitcoin#15468 | PR15468]] (related bug fix). Depends on D5539. Test Plan: ninja check make check Run the Gitian builds. Build and run for Windows. Run `test_bitcoin` on Windows. Reviewers: #bitcoin_abc, jasonbcox Reviewed By: #bitcoin_abc, jasonbcox Differential Revision: https://reviews.bitcoinabc.org/D5539
merge bitcoin#14209, bitcoin#13878, bitcoin#14192, bitcoin#17086: logging and filesystem updates
63e0be6 [Remove] By-pass logprint-scanner restriction. (furszy) 280ced3 utils: Fix broken Windows filelock (Chun Kuan Lee) be89860 utils: Convert Windows args to utf-8 string (Chun Kuan Lee) e8cfa6e Call unicode API on Windows (Chun Kuan Lee) 1a02a8a tests: Add test case for std::ios_base::ate (Chun Kuan Lee) 2e57cd4 Move boost/std fstream to fsbridge (furszy) 9d8bcd4 utils: Add fsbridge fstream function wrapper (Chun Kuan Lee) d59d48d utils: Convert fs error messages from multibyte to utf-8 (ken2812221) 9ef58cc Logging: use "fmterr" variable name for errors instead of general "e" that can be used by any other function. (furszy) dd94241 utils: Use _wfopen and _wreopen on Windows (Chun Kuan Lee) 3993641 add unicode compatible file_lock for Windows (Chun Kuan Lee) 48349f8 Provide relevant error message if datadir is not writable. (murrayn) Pull request description: As the software is currently using the ANSI encoding on Windows, the user's language settings could affect the proper functioning of the node/wallet, to the point of not be able to open some non-ASCII name files and directories. This solves the Windows encoding issues, completing the entire bitcoin#13869 work path (and some other required backports). Enabling for example users that use non-english characters in directories and file names to be accepted. Backported PRs: * bitcoin#12422. * bitcoin#12630. * bitcoin#13862. * bitcoin#13866. * bitcoin#13877. * bitcoin#13878. * bitcoin#13883. * bitcoin#13884. * bitcoin#13886. * bitcoin#13888. * bitcoin#14192. * bitcoin#13734. * bitcoin#14426. This is built on top of other two PRs that i have open #2423 and #2369. Solves old issues #940 and #2163. TODO: * Backport `assert_start_raises_init_error` and `ErrorMatch` in TestNode` (bitcoin#12718) ACKs for top commit: Fuzzbawls: ACK 63e0be6 random-zebra: ACK 63e0be6 and merging... Tree-SHA512: cb1f7c23abb5b7b3af50bba18652cc2cad93fd7c2fca9c16ffd3fee34c4c152a3b666dfa87fe6b44c430064dcdee4367144dcb4a41203c91b0173b805bdb3d7d
If compiled with mingw, use glibc++ extension
stdio_filebuf
to open the file byFILE*
instead of filename.In other condition, we can use boost::fstream.