-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Use filesystem to load filename to prevent encoding issues on Windows #4470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
cc @onnx/sig-archinfra-approvers, @snnn: this PR is ready for review. Thanks! |
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
d514d3f
to
2c429b3
Compare
onnx/checker.cc
Outdated
@@ -127,6 +140,31 @@ void check_tensor(const TensorProto& tensor, const CheckerContext& ctx) { | |||
for (const StringStringEntryProto& entry : tensor.external_data()) { | |||
if (entry.has_key() && entry.has_value() && entry.key() == "location") { | |||
has_location = true; | |||
#ifdef _WIN32 | |||
const fs::path windows_path(entry.value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would not work.
Please manually use https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-multibytetowidechar function to convert entry.value() to a wide char string, and during the conversion you need to set code page to CP_UTF8. Because the ONNX standard expects all strings are UTF-8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could try if it works for paths with Chinese characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I should further use utf8str_to_wstring
(including MultiByteToWideChar
) to convert the std::string into std::wstring first.
onnx/common/path.h
Outdated
} | ||
template <typename STRING> | ||
STRING clean_relative_path(const STRING& path); | ||
std::string clean_relative_path(const char* path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part looks very confusing. Initially I thought the second one was specialization of the first one, Then I realized it missed "template" keyword.
https://docs.microsoft.com/en-us/cpp/cpp/explicit-specialization-of-function-templates?view=msvc-170
So, why for std::string you use template but for C style strings you use overloading? Why not using the same approach for both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer. I added it because we need to explicitly define a function for char[] input (directly giving raw string in double quote)
To prevent confusion, I also templatized this function and added comments. Now I added it as:
// std::string in template cannot be recognized as char[]
// Therefore explicitly define char[] to handle char[] input
template <typename CHAR, typename std::size_t N>
std::basic_string<CHAR> clean_relative_path(const CHAR (&path)[N]) {
return clean_relative_path(std::basic_string<CHAR>(path));
}
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
d2dd672
to
62ad16b
Compare
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
46e602b
to
903b9ed
Compare
@snnn Thank you for the reviews. I think I should solve all of them. Please take another round of review. Meanwhile, I will try to validate this PR and #4400 in ORT and see whether there is any issue in advance. cc @postrational IIRC, you mentioned you are interested in this topic in our last operator-sig meeting. Feel free to review. Thank you! |
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
onnx/test/cpp/common_path_test.cc
Outdated
// Remove leading slash | ||
EXPECT_EQ(clean_relative_path("/abc"), fix_sep("abc")); | ||
EXPECT_EQ(clean_relative_path("/"), fix_sep(".")); | ||
EXPECT_EQ(clean_relative_path("/abc"), "abc"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this test case. It seems that we expect this function would force converting every abs path to relative path? But why "/abc" becomes "abc"? If the current working directory is "/", it makes sense. But, what if we are not in the root dir?
And on Windows abs path may start with things like "C:" , not "/". Then in that case what this function clean_relative_path
would do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question... I also found it is confusing when involving absolute path. Actually ONNX IR only allows using relative path to represent the location for external tensors: https://github.com/onnx/onnx/blob/main/docs/IR.md#external-tensor-data.
Therefore, I further add absolute path check in checker, add comment for clean_relative_path that it cannot work with absolute path, and remove absolute path tests in common_path_test.cc
to prevent confusions.
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ORT is good with this update so I will forward this PR soon. Thanks for the reviews!
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
) * Do not allow to read tensor's external_data outside the model directory (#4400) * Not allow to read tensor external_data outside the model directory Signed-off-by: jnovikov <johnnovikov0@gmail.com> * Fix formatting errors Signed-off-by: jnovikov <johnnovikov0@gmail.com> * Disable segfaulty test Signed-off-by: jnovikov <johnnovikov0@gmail.com> * Fix cpp tests Signed-off-by: jnovikov <johnnovikov0@gmail.com> * Fix UB while removing ../ Signed-off-by: jnovikov <johnnovikov0@gmail.com> * Fix clang-format Signed-off-by: jnovikov <johnnovikov0@gmail.com> * Check for symlinks only on POSIX systems Signed-off-by: jnovikov <johnnovikov0@gmail.com> * Add specific to Windows external_data test Signed-off-by: jnovikov <johnnovikov0@gmail.com> * Change specific Windows external_data test decorator tofix mypy Signed-off-by: jnovikov <johnnovikov0@gmail.com> * Remove unused pathlib Signed-off-by: jnovikov <johnnovikov0@gmail.com> Signed-off-by: jnovikov <johnnovikov0@gmail.com> Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * cherry pick #4470 Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * resolve Windows CI issue: use fixed Python 3.10.5 Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix flake8 Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> Signed-off-by: jnovikov <johnnovikov0@gmail.com> Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> Co-authored-by: Ivan Novikov <johnnovikov0@gmail.com>
* Remove unnecessary import (#4484) * remove unnecessary import Signed-off-by: xadupre <xadupre@microsoft.com> * lint Signed-off-by: xadupre <xadupre@microsoft.com> * black Signed-off-by: xadupre <xadupre@microsoft.com> * black Signed-off-by: xadupre <xadupre@microsoft.com> * lint Signed-off-by: xadupre <xadupre@microsoft.com> * restore old import Signed-off-by: xadupre <xadupre@microsoft.com> Signed-off-by: xadupre <xadupre@microsoft.com> * fix mypy issues Signed-off-by: sdpython <xavier.dupre@gmail.com> Signed-off-by: xadupre <xadupre@microsoft.com> * fix mypy issues Signed-off-by: sdpython <xavier.dupre@gmail.com> Signed-off-by: xadupre <xadupre@microsoft.com> * update mypy Signed-off-by: sdpython <xavier.dupre@gmail.com> Signed-off-by: xadupre <xadupre@microsoft.com> * primary ops to function milestone 1 (#4458) * primary ops to function milestone 1 Signed-off-by: Liqun Fu <liqfu@microsoft.com> * float type Signed-off-by: Liqun Fu <liqfu@microsoft.com> * formatting Signed-off-by: Liqun Fu <liqfu@microsoft.com> * pass backend test Signed-off-by: Liqun Fu <liqfu@microsoft.com> * formatting Signed-off-by: Liqun Fu <liqfu@microsoft.com> * commit Signed-off-by: Liqun Fu <liqfu@microsoft.com> * format Signed-off-by: Liqun Fu <liqfu@microsoft.com> * format Signed-off-by: Liqun Fu <liqfu@microsoft.com> * layernorm Signed-off-by: Liqun Fu <liqfu@microsoft.com> * pass runtime check Signed-off-by: Liqun Fu <liqfu@microsoft.com> * format Signed-off-by: Liqun Fu <liqfu@microsoft.com> * pure function Signed-off-by: Liqun Fu <liqfu@microsoft.com> * format Signed-off-by: Liqun Fu <liqfu@microsoft.com> * reviewer's comments, clean up some Signed-off-by: Liqun Fu <liqfu@microsoft.com> * keep op original version Signed-off-by: Liqun Fu <liqfu@microsoft.com> * fix gtest.function_verify_test Signed-off-by: Liqun Fu <liqfu@microsoft.com> * formatting Signed-off-by: Liqun Fu <liqfu@microsoft.com> * update according to reviewer's comment Signed-off-by: Liqun Fu <liqfu@microsoft.com> Signed-off-by: Liqun Fu <liqfu@microsoft.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Signed-off-by: xadupre <xadupre@microsoft.com> * lint Signed-off-by: sdpython <xavier.dupre@gmail.com> Signed-off-by: xadupre <xadupre@microsoft.com> * remove # type: ignore for imports Signed-off-by: xadupre <xadupre@microsoft.com> * lint Signed-off-by: sdpython <xavier.dupre@gmail.com> Signed-off-by: xadupre <xadupre@microsoft.com> * Use filesystem to load filename to prevent encoding issues on Windows (#4470) * apply filesystem from C+17 to handle encoding on Windows Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * add comment in CMakeLists Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * precise msg if missing support Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * void normalize_sep for two types Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * remove typo const Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * remove template functions to header Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * apply clang-format Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * use _wstat Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * use define function to refactor code Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * add required C++ version in readme Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * void char* for std::string in test Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix clang-format Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * move wchar_t and wstring only for Windows Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * refactor template Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * typo Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * use char tempalte in template Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * add comments Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * add tests for wstring Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * typo Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * nit comments Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * use existing functions from std::filesystem::path Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * honor != Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * constexpr const char Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * checker disallow absolute path in external tensors; remove related tests Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix format Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * add more checker tests Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * black Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * improve comments Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> Signed-off-by: xadupre <xadupre@microsoft.com> * remove unnecessary ignore-missing-type Signed-off-by: sdpython <xavier.dupre@gmail.com> Signed-off-by: xadupre <xadupre@microsoft.com> * remove two type ignore Signed-off-by: xadupre <xadupre@microsoft.com> * lint Signed-off-by: xadupre <xadupre@microsoft.com> * type Signed-off-by: xadupre <xadupre@microsoft.com> Signed-off-by: xadupre <xadupre@microsoft.com> Signed-off-by: sdpython <xavier.dupre@gmail.com> Signed-off-by: Liqun Fu <liqfu@microsoft.com> Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> Co-authored-by: sdpython <xavier.dupre@gmail.com> Co-authored-by: liqun Fu <liqfu@microsoft.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Co-authored-by: Chun-Wei Chen <jacky82226@gmail.com>
…onnx#4470) * apply filesystem from C+17 to handle encoding on Windows Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * add comment in CMakeLists Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * precise msg if missing support Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * void normalize_sep for two types Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * remove typo const Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * remove template functions to header Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * apply clang-format Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * use _wstat Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * use define function to refactor code Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * add required C++ version in readme Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * void char* for std::string in test Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix clang-format Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * move wchar_t and wstring only for Windows Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * refactor template Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * typo Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * use char tempalte in template Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * add comments Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * add tests for wstring Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * typo Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * nit comments Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * use existing functions from std::filesystem::path Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * honor != Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * constexpr const char Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * checker disallow absolute path in external tensors; remove related tests Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix format Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * add more checker tests Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * black Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * improve comments Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
* Remove unnecessary import (onnx#4484) * remove unnecessary import Signed-off-by: xadupre <xadupre@microsoft.com> * lint Signed-off-by: xadupre <xadupre@microsoft.com> * black Signed-off-by: xadupre <xadupre@microsoft.com> * black Signed-off-by: xadupre <xadupre@microsoft.com> * lint Signed-off-by: xadupre <xadupre@microsoft.com> * restore old import Signed-off-by: xadupre <xadupre@microsoft.com> Signed-off-by: xadupre <xadupre@microsoft.com> * fix mypy issues Signed-off-by: sdpython <xavier.dupre@gmail.com> Signed-off-by: xadupre <xadupre@microsoft.com> * fix mypy issues Signed-off-by: sdpython <xavier.dupre@gmail.com> Signed-off-by: xadupre <xadupre@microsoft.com> * update mypy Signed-off-by: sdpython <xavier.dupre@gmail.com> Signed-off-by: xadupre <xadupre@microsoft.com> * primary ops to function milestone 1 (onnx#4458) * primary ops to function milestone 1 Signed-off-by: Liqun Fu <liqfu@microsoft.com> * float type Signed-off-by: Liqun Fu <liqfu@microsoft.com> * formatting Signed-off-by: Liqun Fu <liqfu@microsoft.com> * pass backend test Signed-off-by: Liqun Fu <liqfu@microsoft.com> * formatting Signed-off-by: Liqun Fu <liqfu@microsoft.com> * commit Signed-off-by: Liqun Fu <liqfu@microsoft.com> * format Signed-off-by: Liqun Fu <liqfu@microsoft.com> * format Signed-off-by: Liqun Fu <liqfu@microsoft.com> * layernorm Signed-off-by: Liqun Fu <liqfu@microsoft.com> * pass runtime check Signed-off-by: Liqun Fu <liqfu@microsoft.com> * format Signed-off-by: Liqun Fu <liqfu@microsoft.com> * pure function Signed-off-by: Liqun Fu <liqfu@microsoft.com> * format Signed-off-by: Liqun Fu <liqfu@microsoft.com> * reviewer's comments, clean up some Signed-off-by: Liqun Fu <liqfu@microsoft.com> * keep op original version Signed-off-by: Liqun Fu <liqfu@microsoft.com> * fix gtest.function_verify_test Signed-off-by: Liqun Fu <liqfu@microsoft.com> * formatting Signed-off-by: Liqun Fu <liqfu@microsoft.com> * update according to reviewer's comment Signed-off-by: Liqun Fu <liqfu@microsoft.com> Signed-off-by: Liqun Fu <liqfu@microsoft.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Signed-off-by: xadupre <xadupre@microsoft.com> * lint Signed-off-by: sdpython <xavier.dupre@gmail.com> Signed-off-by: xadupre <xadupre@microsoft.com> * remove # type: ignore for imports Signed-off-by: xadupre <xadupre@microsoft.com> * lint Signed-off-by: sdpython <xavier.dupre@gmail.com> Signed-off-by: xadupre <xadupre@microsoft.com> * Use filesystem to load filename to prevent encoding issues on Windows (onnx#4470) * apply filesystem from C+17 to handle encoding on Windows Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * add comment in CMakeLists Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * precise msg if missing support Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * void normalize_sep for two types Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * remove typo const Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * remove template functions to header Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * apply clang-format Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * use _wstat Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * use define function to refactor code Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * add required C++ version in readme Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * void char* for std::string in test Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix clang-format Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * move wchar_t and wstring only for Windows Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * refactor template Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * typo Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * use char tempalte in template Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * add comments Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * add tests for wstring Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * typo Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * nit comments Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * use existing functions from std::filesystem::path Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * honor != Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * constexpr const char Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * checker disallow absolute path in external tensors; remove related tests Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix format Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * add more checker tests Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * black Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * improve comments Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> Signed-off-by: xadupre <xadupre@microsoft.com> * remove unnecessary ignore-missing-type Signed-off-by: sdpython <xavier.dupre@gmail.com> Signed-off-by: xadupre <xadupre@microsoft.com> * remove two type ignore Signed-off-by: xadupre <xadupre@microsoft.com> * lint Signed-off-by: xadupre <xadupre@microsoft.com> * type Signed-off-by: xadupre <xadupre@microsoft.com> Signed-off-by: xadupre <xadupre@microsoft.com> Signed-off-by: sdpython <xavier.dupre@gmail.com> Signed-off-by: Liqun Fu <liqfu@microsoft.com> Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> Co-authored-by: sdpython <xavier.dupre@gmail.com> Co-authored-by: liqun Fu <liqfu@microsoft.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Co-authored-by: Chun-Wei Chen <jacky82226@gmail.com>
Description
Motivation and Context
Follow-up work of #4400. Here is another security issue when loading external tensors on Windows:
On Linux, a path string is either encoded by Unicode(UTF-16), or a multiple byte encoding like GB18030/ISO-8859-1/…, and usually it is not UTF-8. As there are so many encodings, usually we need to convert the string to UTF-16 first and process it with wchar_t. Otherwise, when you search the char ‘\’, you may accidently misclassify a half of a Chinese character to ‘\’. (It can happen in other encodings too). On Linux, the same thing could happen too, but we could claim on Linux we don’t support locales other than xxx.UTF-8. But for Windows, we need to handle other encodings. Using filesystem could be a solution.