-
Notifications
You must be signed in to change notification settings - Fork 615
Description
Describe the bug
Sometimes OpenCC conversion only applies the first dict in json file, but skips following ones.
To Reproduce
Steps to reproduce the bug:
- Create following files in
Rime/opencc
directory:
- t.json
{
"name": "Test Conversion",
"segmentation": {
"type": "mmseg",
"dict": {
"type": "text",
"file": "t1.txt"
}
},
"conversion_chain": [{
"dict": {
"type": "text",
"file": "t1.txt"
}
}, {
"dict": {
"type": "text",
"file": "t2.txt"
}
}]
}
- t1.txt (tab separated dictionary)
三 二
- t2.txt (tab separated dictionary)
二 一
- Create custom patch for luna pinyin:
- luna_pinyin.custom.yaml
patch:
test_conversion:
opencc_config: t.json
option_name: test
tips: all
engine/filters/@next: simplifier@test_conversion
switches/@next: { name: test, reset: 1, states: [ "off", "on" ] }
- Deploy rime, activate luna pinyin, and type for 「三」 and 「三人」
Expected behavior
All character 「三」 should be converted to 「一」 finally. However, the single character 「三」 converted to 「二」, which is an intermediate result. 「三人」 can be correctly converted to 「一人」.
Also tested with OpenCC command line tool and got correct results.
$ echo 三 | opencc -c ./t.json
一
$ echo 三人 | opencc -c ./t.json
一人
Flavor(please complete the following information):
Select your flavor:
- Squirrel
- Weasel
- Hamster
Package:
- OS: Windows 10
- Version: 21H2 (19044.2846)
- URI: Weasel 0.14.3
Additional context
I found the logic in simplifier.cc:
When the original candidate as a whole can be converted by Opencc::ConvertWord
, it will not be further converted. Otherwise the candidate will be converted with Opencc::ConvertText
.
Opencc::ConvertWord
looks up the value from Opencc::dict_
which only uses the first conversion of the chain. By contrast, Opencc::ConvertText
calls the OpenCC converter's Convert
method, which will use the whole chain.