Skip to content

feat(translator_commons): add dictionary_exclude to exclude words #1008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 5, 2025

Conversation

ksqsf
Copy link
Member

@ksqsf ksqsf commented Apr 18, 2025

closes #883

该 PR 新增 translator/dictionary_exclude

translator:
  dictionary: luna_pinyin
  dictionary_exclude: 
    - 零零

其语义严格等价于从 .dict.yaml 中删除对应词条,但不影响组句和用户词库,即用户依然可以选字打出对应词。在上面的例子中, luna_pinyin 在用户词库为空时,将无法输出「零零」,但用户可以依次选取2次「零」造词,之后又可以直接输出「零零」。

未解决问题:

  • 是否应该使用外部 .txt 或 .dict.yaml 文件、而非 schema.yaml 来指定黑名单?

@ksqsf ksqsf requested a review from lotem April 18, 2025 14:00
Previously, the user must delete words from the dictionary.
@ksqsf ksqsf marked this pull request as draft April 18, 2025 14:17
@ksqsf ksqsf marked this pull request as ready for review April 18, 2025 15:06
@lotem
Copy link
Member

lotem commented Apr 18, 2025

我有個疑問,屏蔽詞還會參與造句嗎?

@lotem
Copy link
Member

lotem commented Apr 18, 2025

在配置裏指定屏蔽詞列表應該夠用了。

@ksqsf
Copy link
Member Author

ksqsf commented Apr 18, 2025

我有個疑問,屏蔽詞還會參與造句嗎?

语义依然是「等价于从 dict.yaml 里删词」:

  1. 如果 user dict 里没有,那不会参与组句,除非它本身可以被别的词组出来
  2. 如果 user dict 里有,那相当于用户词参与组句

@jimmy54
Copy link
Contributor

jimmy54 commented Apr 21, 2025

建议---使用外部 .txt 或 .dict.yaml 文件、而非 schema.yaml 来指定黑名单?

@ksqsf
Copy link
Member Author

ksqsf commented Apr 21, 2025

假设黑名单比较小,用 .custom.yaml 就相当于「外部文件」了:

patch:
  translator/dictionary_exclude: ["词1", "词2"]

用 txt 和 dict.yaml 的话我目前不知道怎么实现 :(

@lotem
Copy link
Member

lotem commented Apr 21, 2025

用 txt 和 dict.yaml 的话我目前不知道怎么实现 :(

可以先用目前這個支持小規模的列表,以後可以繼續做,讓這項配置兼容外部文件,不給列表而是給文件名(字符串值)。

return false;
}
while (filter_ && !filter_(Peek())) {
do {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

這裏邏輯沒變吧?
寫成這樣可能是因爲看到有的書推薦不用 do-while。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

循环体内必须要 reset,FindNextEntry 在读到 entry 的时候就不干活了

Copy link
Member Author

@ksqsf ksqsf Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

说错了,是 Peek 会不断返回相同的 entry,最后把所有候选都删掉了。

@@ -137,6 +137,7 @@ void DictEntryIterator::AddFilter(DictEntryFilter filter) {
// the introduced filter could invalidate the current or even all the
// remaining entries
while (!exhausted() && !filter_(Peek())) {
entry_.reset();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

這是弄啥咧?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

for (auto& v : *collector) {
v.second.Sort();
if (blacklist && !blacklist->empty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

吶,我沒分析代碼,先問問看。

排完序再過濾,過濾完,順序還對不對呢。

還有,過濾完的迭代器爲空的情況,能不能處理好。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

过滤应该不影响顺序吧。

迭代器为空的情况测试了,应该没什么问题。逻辑上跟之前的 filter by charset 是一样的,如果现在有问题,那之前就也有问题。(上面的 reset 就是之前就有的 bug。)

@jimmy54
Copy link
Contributor

jimmy54 commented Apr 25, 2025

用 txt 和 dict.yaml 的话我目前不知道怎么实现 :(

可以先用目前這個支持小規模的列表,以後可以繼續做,讓這項配置兼容外部文件,不給列表而是給文件名(字符串值)。

如果要在前端的程序添加,是否比较麻烦呢?还是前端只想手动添加到schema文件?

@ksqsf
Copy link
Member Author

ksqsf commented May 3, 2025

ping

Copy link
Member

@lotem lotem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ksqsf ksqsf merged commit 959937e into master May 5, 2025
10 checks passed
@ksqsf ksqsf deleted the dict-exclude branch May 5, 2025 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

添加配制选项以移除特定候选词
3 participants