Skip to content

修改一行代码可以让DAT构建速度提升N倍 #1801

@qiangwang

Description

@qiangwang

Describe the feature and the current behavior/state.

int pos = Math.max(siblings.get(0).code + 1, nextCheckPos) - 1;

int pos = Math.max(siblings.get(0).code + 1, nextCheckPos) - 1;
改成
int pos = Math.max(siblings.get(0).code, nextCheckPos);

构建速度可以提升很多而且表现稳定,缺点是最终构建出的DAT大小微增。下面是我的测试数据:

image

其中保留nextCheckPos为原版代码,去掉nextCheckPos用作对比,核心循环计数是在循环体内做了一个count++计数:
image

Will this change the current api? How?

No

Who will benefit with this feature?

Anyone who uses DAT

Are you willing to contribute it (Yes/No):

Yes

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS Big Sur
  • Python version:
  • HanLP version: hanlp-portable-1.8.3.jar

Any other info

  • I've carefully completed this form.

Metadata

Metadata

Assignees

Labels

feature requestSuggest an idea for this project

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions