-
Notifications
You must be signed in to change notification settings - Fork 10.7k
Closed
Labels
Description
Describe the bug
A clear and concise description of what the bug is.
- 有个issue关于调用CharTabel,把“幺”改为“么”不合理portable修复了,但是下载1.7.5 zip包有问题,后发现CharTable.txt.bin md5不一致
- 以下字符有问题:其中第一列是原始字符,第二列是归一化后字符,括号表示 建议可以考虑括号内字符替换原有归一化内容
猛 勐
蜺 霓
脊 嵴
骼 胳
拾 十
劈 噼
溜 熘
呱 哌
怵 憷
糸 纟(丝)
乾 干
艸 艹(草)
Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.
public void testCharTable() {
Map<String, String> normalizationBadCase = new HashMap<>();
normalizationBadCase.put("猛", "猛");
normalizationBadCase.put("蜺", "蜺");
normalizationBadCase.put("脊", "脊");
normalizationBadCase.put("骼", "骼");
normalizationBadCase.put("拾", "拾");
normalizationBadCase.put("劈", "劈");
normalizationBadCase.put("溜", "溜");
normalizationBadCase.put("呱", "呱");
normalizationBadCase.put("怵", "怵");
normalizationBadCase.put("糸", "丝");
normalizationBadCase.put("乾", "乾");
normalizationBadCase.put("艸", "草");
for (Map.Entry<String, String> entry : normalizationBadCase.entrySet()) {
assert CharTable.convert(entry.getKey()).equals(entry.getValue());
}
}
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): win10
- Python version:
- HanLP version: 1.8.0
- I've completed this form and searched the web for solutions.