-
Notifications
You must be signed in to change notification settings - Fork 79
add path encoding #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If I understand correctly, mecab should use UTF-8 character encoding. Line 189 in 60ea4df
Perhaps you might have built mecab by yourself or added config.h without running cmake though setup.py? |
I see, it is set to be processed in UTF-8.... What I did was Possibly, the charset of the path and the one mecab uses for input/output are different...? Lines 31 to 34 in 60ea4df
I would like to ask another person to verify this because I am losing confidence. |
Umm, maybe @oocytanb can help? I am afraid I don't have a windows machine to investigate the issue. |
Please excuse me for writing in Japanese 日本語で失礼します。 MSVC のコンパイラーオプションは、Windowsでビルドできるように、追加したものです。 |
MeCab のコードを見たところ、ファイルを開くところでは、 |
Ah, how about adding a test that reads the mecab dictionary in the Japanese directory? |
僕は癖で、すべてのGithubレポジトリで英語で書いてますが、日本語関連のリポジトリでは日本語でもちろんかまいません。 日本語を含むディレクトリに関するテストは、追加していただくのが良いと思います。 |
なるほどです。 せっかくの議論が探しづらくなるのは勿体ないので、ひとまずissueを立ててこのページを参照するのはどうでしょう。 |
はい、それで問題ないと思います。 |
If the path contains Japanese in windows, mecab_load will give an error.
After a lot of testing, it seems that mecab_load is expecting a shift-jis binary sequence.
I'm guessing from the fact that it worked fine with
.encode
set to shift-jis.I haven't been able to follow the code in detail, so I don't know the details.
I didn't know how to find out what charset the system path expects, but using
locale.getpreferredencoding()
seemed to be a good idea.https://docs.python.org/ja/3/library/locale.html#locale.getpreferredencoding
I have confirmed that it works fine in windows.