Skip to content

OpenCC "conversion_chain" not fully working #652

@amorphobia

Description

@amorphobia

Describe the bug
Sometimes OpenCC conversion only applies the first dict in json file, but skips following ones.

To Reproduce
Steps to reproduce the bug:

  1. Create following files in Rime/opencc directory:
  • t.json
{
  "name": "Test Conversion",
  "segmentation": {
    "type": "mmseg",
    "dict": {
      "type": "text",
      "file": "t1.txt"
    }
  },
  "conversion_chain": [{
    "dict": {
      "type": "text",
      "file": "t1.txt"
    }
  }, {
    "dict": {
      "type": "text",
      "file": "t2.txt"
    }
  }]
}
  • t1.txt (tab separated dictionary)
三	二
  • t2.txt (tab separated dictionary)
二	一
  1. Create custom patch for luna pinyin:
  • luna_pinyin.custom.yaml
patch:
  test_conversion:
    opencc_config: t.json
    option_name: test
    tips: all
  engine/filters/@next: simplifier@test_conversion
  switches/@next: { name: test, reset: 1, states: [ "off", "on" ] }
  1. Deploy rime, activate luna pinyin, and type for 「三」 and 「三人」

Expected behavior
All character 「三」 should be converted to 「一」 finally. However, the single character 「三」 converted to 「二」, which is an intermediate result. 「三人」 can be correctly converted to 「一人」.

Also tested with OpenCC command line tool and got correct results.

$ echo| opencc -c ./t.json
一
$ echo 三人 | opencc -c ./t.json
一人

Screenshots
三 to 二
三人 to 一人

Flavor(please complete the following information):
Select your flavor:

  • Squirrel
  • Weasel
  • Hamster

Package:

Additional context

I found the logic in simplifier.cc:

When the original candidate as a whole can be converted by Opencc::ConvertWord, it will not be further converted. Otherwise the candidate will be converted with Opencc::ConvertText.

Opencc::ConvertWord looks up the value from Opencc::dict_ which only uses the first conversion of the chain. By contrast, Opencc::ConvertText calls the OpenCC converter's Convert method, which will use the whole chain.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions