Skip to content

Array out-of-bound access bug #5492

@zhangzhang10

Description

@zhangzhang10

In file src/common/quantil.h, at line 210-211:

CHECK(i != src.size - 1);
if (dx2 < src.data[i].RMinNext() + src.data[i + 1].RMaxPrev()) { ... ... }

The CHECK logs an error message if i == src.size - 1, then execution continues to the next line where src.data[i + 1] is accessed. This appears to be an out-of-bound array access error. Using a large dataset, e.g. a 64GB mortgage dataset, in distributed training on Spark, we see task failures that can be attributed to this bug.

The two lines of code mentioned above are found in function WQSummary::SetPrune(), which have been around for years, but the problem manifests itself only recently when this PR was merged. One thing the PR changed was switching from WXQSketch to WQSketch. As a result, WQSummary::SetPrune() replaced WXQSummary::SetPrune() in the execution path. In WXQSummary::SetPrune(), there was a similar check, but it breaks out of the enclosing for-loop instead of continuing when the check fails, see line 425-426 in file quantile.h:

if (i == end) break;
if (dx2 < src.data[i].RMinNext() + src.data[i + 1].RMaxPrev()) { ... ... }

I believe we should do the same (breaking from the for-loop) in WQSummary::SetPrune(). Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions