-
Notifications
You must be signed in to change notification settings - Fork 599
[#8218] fix(doris): Fix BUCKETS AUTO parsing for RANDOM distribution #8240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@mbaurin |
Thanks for raising this ! I did some research and found that BUCKETS AUTO was actually introduced in Apache Doris 1.2.2, not earlier 1.2.x versions. From the Doris 1.2.2 release notes and documentation:
What specific Doris version is Gravitino's catalog currently targeting? If we're supporting 1.2.2+, then this fix is valid. If we're still on 1.2.0/1.2.1, we might need to either:
The existing codebase already had partial support for this (the HASH variant was working), so it seems like there was already some expectation of BUCKETS AUTO support. What's your preference on how to handle this version compatibility ? Happy to adjust the approach accordingly ! |
Accordingly to docs, the version is |
Good point, here are some approaches to support both: Option 1: Graceful degradation
Option 2: Runtime version detection
Option 3: Configuration-based
Option 4: Keep current approach
I'd lean toward Option 1, it's the least disruptive and provides the best user experience. What do you think? |
46bebdd
to
d4f581b
Compare
Yeah, solution one is more graceful, and the change is less intrusive. |
09aa88a
to
65f6da5
Compare
@yuqi1129 Option 1 implemented 👍 |
65f6da5
to
4c44994
Compare
Please resolve the code format issue. |
8631743
to
86a58fa
Compare
86a58fa
to
06029e6
Compare
…8240) <!-- 1. Title: [#<issue>] <type>(<scope>): <subject> Examples: - "[#123] feat(operator): support xxx" - "[#233] fix: check null before access result in xxx" - "[MINOR] refactor: fix typo in variable name" - "[MINOR] docs: fix typo in README" - "[#255] test: fix flaky test NameOfTheTest" Reference: https://www.conventionalcommits.org/en/v1.0.0/ 2. If the PR is unfinished, please mark this PR as draft. --> ### What changes were proposed in this pull request? Fixed the `DorisUtils.extractBucketNum()` method to properly handle `BUCKETS AUTO` parsing for `DISTRIBUTED BY RANDOM` statements. The method was incorrectly calling `matcher.find(5)` which advances the matcher position, causing the bucket value extraction to fail and resulting in a `NumberFormatException` when parsing null values. **Changes:** - Changed `if (matcher.find(5))` to `if (matcher.group(5) != null)` in `DorisUtils.extractBucketNum()` - Added test case for `DISTRIBUTED BY RANDOM BUCKETS AUTO` in `TestDorisUtils.testDistributedInfoPattern()` ### Why are the changes needed? This fixes a critical bug where Doris table operations fail when encountering `DISTRIBUTED BY RANDOM BUCKETS AUTO` syntax. The root cause was that `matcher.find(5)` advances the matcher position after the initial match, and since there are no subsequent matches, it returns `false` and never reaches the bucket value extraction logic. This leaves `bucketValue` as null, causing `Integer.valueOf(bucketValue)` to throw `NumberFormatException: Cannot parse null string`. The bug affects users trying to load tables with auto bucket distribution using random strategy, which is a valid Doris SQL syntax. Fix: #8218 ### Does this PR introduce _any_ user-facing change? No user-facing API changes. This is a bug fix that enables proper parsing of existing Doris SQL syntax that was previously failing. Users can now successfully work with tables using `DISTRIBUTED BY RANDOM BUCKETS AUTO` without encountering parsing errors. ### How was this patch tested? 1. **Existing tests**: Verified all existing Doris utility tests continue to pass, ensuring no regression 2. **New test case**: Added `TestDorisUtils.testDistributedInfoPattern()` test case specifically for `DISTRIBUTED BY RANDOM BUCKETS AUTO` to prevent future regressions 3. **Manual verification**: Created and ran a standalone test to reproduce the original bug and verify the fix resolves the `NumberFormatException` 4. **Code formatting**: Applied Spotless formatting to ensure code style compliance **Test command:** ``` ./gradlew :catalogs:catalog-jdbc-doris:test --tests="*TestDorisUtils*" ``` All tests pass successfully with the fix applied.
What changes were proposed in this pull request?
Fixed the
DorisUtils.extractBucketNum()
method to properly handleBUCKETS AUTO
parsing forDISTRIBUTED BY RANDOM
statements. The method was incorrectly callingmatcher.find(5)
which advances the matcher position, causing the bucket value extraction to fail and resulting in aNumberFormatException
when parsing null values.Changes:
if (matcher.find(5))
toif (matcher.group(5) != null)
inDorisUtils.extractBucketNum()
DISTRIBUTED BY RANDOM BUCKETS AUTO
inTestDorisUtils.testDistributedInfoPattern()
Why are the changes needed?
This fixes a critical bug where Doris table operations fail when encountering
DISTRIBUTED BY RANDOM BUCKETS AUTO
syntax. The root cause was thatmatcher.find(5)
advances the matcher position after the initial match, and since there are no subsequent matches, it returnsfalse
and never reaches the bucket value extraction logic. This leavesbucketValue
as null, causingInteger.valueOf(bucketValue)
to throwNumberFormatException: Cannot parse null string
.The bug affects users trying to load tables with auto bucket distribution using random strategy, which is a valid Doris SQL syntax.
Fix: #8218
Does this PR introduce any user-facing change?
No user-facing API changes. This is a bug fix that enables proper parsing of existing Doris SQL syntax that was previously failing. Users can now successfully work with tables using
DISTRIBUTED BY RANDOM BUCKETS AUTO
without encountering parsing errors.How was this patch tested?
TestDorisUtils.testDistributedInfoPattern()
test case specifically forDISTRIBUTED BY RANDOM BUCKETS AUTO
to prevent future regressionsNumberFormatException
Test command:
All tests pass successfully with the fix applied.