Skip to content

Conversation

github-actions[bot]
Copy link

@github-actions github-actions bot commented Sep 5, 2025

What changes were proposed in this pull request?

Fixed the DorisUtils.extractBucketNum() method to properly handle BUCKETS AUTO parsing for DISTRIBUTED BY RANDOM statements. The method was incorrectly calling matcher.find(5) which advances the matcher position, causing the bucket value extraction to fail and resulting in a NumberFormatException when parsing null values.

Changes:

  • Changed if (matcher.find(5)) to if (matcher.group(5) != null) in DorisUtils.extractBucketNum()
  • Added test case for DISTRIBUTED BY RANDOM BUCKETS AUTO in TestDorisUtils.testDistributedInfoPattern()

Why are the changes needed?

This fixes a critical bug where Doris table operations fail when encountering DISTRIBUTED BY RANDOM BUCKETS AUTO syntax. The root cause was that matcher.find(5) advances the matcher position after the initial match, and since there are no subsequent matches, it returns false and never reaches the bucket value extraction logic. This leaves bucketValue as null, causing Integer.valueOf(bucketValue) to throw NumberFormatException: Cannot parse null string.

The bug affects users trying to load tables with auto bucket distribution using random strategy, which is a valid Doris SQL syntax.

Fix: #8218

Does this PR introduce any user-facing change?

No user-facing API changes. This is a bug fix that enables proper parsing of existing Doris SQL syntax that was previously failing. Users can now successfully work with tables using DISTRIBUTED BY RANDOM BUCKETS AUTO without encountering parsing errors.

How was this patch tested?

  1. Existing tests: Verified all existing Doris utility tests continue to pass, ensuring no regression
  2. New test case: Added TestDorisUtils.testDistributedInfoPattern() test case specifically for DISTRIBUTED BY RANDOM BUCKETS AUTO to prevent future regressions
  3. Manual verification: Created and ran a standalone test to reproduce the original bug and verify the fix resolves the NumberFormatException
  4. Code formatting: Applied Spotless formatting to ensure code style compliance

Test command:

./gradlew :catalogs:catalog-jdbc-doris:test --tests="*TestDorisUtils*"

All tests pass successfully with the fix applied.

…8240)

<!--
1. Title: [#<issue>] <type>(<scope>): <subject>
   Examples:
     - "[#123] feat(operator): support xxx"
     - "[#233] fix: check null before access result in xxx"
     - "[MINOR] refactor: fix typo in variable name"
     - "[MINOR] docs: fix typo in README"
     - "[#255] test: fix flaky test NameOfTheTest"
   Reference: https://www.conventionalcommits.org/en/v1.0.0/
2. If the PR is unfinished, please mark this PR as draft.
-->

### What changes were proposed in this pull request?

Fixed the `DorisUtils.extractBucketNum()` method to properly handle
`BUCKETS AUTO` parsing for `DISTRIBUTED BY RANDOM` statements. The
method was incorrectly calling `matcher.find(5)` which advances the
matcher position, causing the bucket value extraction to fail and
resulting in a `NumberFormatException` when parsing null values.

**Changes:**
- Changed `if (matcher.find(5))` to `if (matcher.group(5) != null)` in
`DorisUtils.extractBucketNum()`
- Added test case for `DISTRIBUTED BY RANDOM BUCKETS AUTO` in
`TestDorisUtils.testDistributedInfoPattern()`

### Why are the changes needed?

This fixes a critical bug where Doris table operations fail when
encountering `DISTRIBUTED BY RANDOM BUCKETS AUTO` syntax. The root cause
was that `matcher.find(5)` advances the matcher position after the
initial match, and since there are no subsequent matches, it returns
`false` and never reaches the bucket value extraction logic. This leaves
`bucketValue` as null, causing `Integer.valueOf(bucketValue)` to throw
`NumberFormatException: Cannot parse null string`.

The bug affects users trying to load tables with auto bucket
distribution using random strategy, which is a valid Doris SQL syntax.

  Fix: #8218 

### Does this PR introduce _any_ user-facing change?

No user-facing API changes. This is a bug fix that enables proper
parsing of existing Doris SQL syntax that was previously failing. Users
can now successfully work with tables using `DISTRIBUTED BY RANDOM
BUCKETS AUTO` without encountering parsing errors.

### How was this patch tested?


1. **Existing tests**: Verified all existing Doris utility tests
continue to pass, ensuring no regression
2. **New test case**: Added
`TestDorisUtils.testDistributedInfoPattern()` test case specifically for
`DISTRIBUTED BY RANDOM BUCKETS AUTO` to prevent future regressions
3. **Manual verification**: Created and ran a standalone test to
reproduce the original bug and verify the fix resolves the
`NumberFormatException`
4. **Code formatting**: Applied Spotless formatting to ensure code style
compliance

**Test command:**
```
./gradlew :catalogs:catalog-jdbc-doris:test --tests="*TestDorisUtils*"
```

All tests pass successfully with the fix applied.
@github-actions github-actions bot requested a review from jerryshao September 5, 2025 01:47
@yuqi1129 yuqi1129 closed this Sep 5, 2025
@yuqi1129 yuqi1129 reopened this Sep 5, 2025
@yuqi1129 yuqi1129 merged commit ad3372c into branch-1.0 Sep 5, 2025
28 checks passed
@yuqi1129 yuqi1129 deleted the cherry-pick-branch-1.0-a3eda2eddc26442698c4d386e96efcb59b4c0ecb branch September 5, 2025 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants