Skip to content

Conversation

morningman
Copy link
Contributor

@morningman morningman commented Jun 28, 2025

Fix #6978

Description

This PR adds comprehensive support for the Apache Doris SQL dialect to SQLFluff, extending the existing MySQL grammar with Doris-specific syntax features. The implementation includes support for CREATE TABLE, DROP TABLE, INSERT statements, and various Doris-specific table properties and clauses.

Features Added

1. CREATE TABLE Statement Support

  • Engine Types: Support for Doris-specific engines including olap, mysql, elasticsearch, hive, hudi, iceberg, jdbc, broker
  • Key Types: Support for DUPLICATE KEY, AGGREGATE KEY, and UNIQUE KEY syntax
  • Partitioning:
    • Auto partitioning with AUTO PARTITION BY RANGE(function) ()
    • Manual range partitioning with PARTITION BY RANGE(columns)
    • List partitioning with PARTITION BY LIST(columns)
  • Distribution: Support for DISTRIBUTED BY HASH(columns) and DISTRIBUTED BY RANDOM
  • Rollup Definitions: Support for ROLLUP clauses
  • Table Properties: Support for PROPERTIES clauses with key-value pairs
  • Index Definitions: Support for Doris-specific index types (INVERTED, BITMAP, BLOOM_FILTER)
  • Generated Columns: Support for GENERATED ALWAYS AS syntax
  • CREATE TABLE ... AS SELECT (CTAS): Full support for CTAS with optional properties
  • CREATE TABLE ... LIKE: Support for creating tables based on existing table structure
  • External Tables: Support for CREATE EXTERNAL TABLE and CREATE TEMPORARY EXTERNAL TABLE

2. DROP TABLE Statement Support

  • Support for DROP TABLE [IF EXISTS] [db_name.]table_name [FORCE] syntax
  • Database-qualified table names
  • Optional IF EXISTS clause
  • Optional FORCE keyword for unrecoverable table deletion

3. INSERT Statement Support

  • Basic INSERT INTO table VALUES (...) syntax
  • Column specification with INSERT INTO table (col1, col2) VALUES (...)
  • DEFAULT value support in VALUES clauses
  • Multiple row insertion with comma-separated VALUES
  • INSERT ... SELECT statements
  • Partition specification with PARTITION (p1, p2) clause
  • Label specification with WITH LABEL label1 clause
  • Complex combinations of all INSERT features

4. Doris-Specific Keywords and Grammar

  • Added Doris-specific reserved and unreserved keywords
  • Support for Doris aggregation functions (MAX, MIN, REPLACE, SUM, BITMAP_UNION, HLL_UNION, QUANTILE_UNION)
  • Support for STORED and VIRTUAL keywords in generated columns
  • Proper handling of Doris-specific data types and constraints

Technical Implementation

  • Base Dialect: Extends MySQL dialect to leverage existing MySQL grammar compatibility
  • Custom Segments: Implements Doris-specific grammar segments for complex syntax
  • Keyword Management: Properly manages Doris-specific keywords without conflicts
  • Grammar Extensions: Adds Doris-specific grammar rules while maintaining compatibility

Testing

Comprehensive test coverage includes:

  • CREATE TABLE tests: 15+ test files covering various table creation scenarios
  • DROP TABLE tests: 4 test files covering different drop scenarios
  • INSERT tests: 9 test files covering various insert patterns
  • Hive integration tests: Multiple test files for Hive catalog integration
  • Complex syntax tests: Edge cases and combinations of multiple features

Compatibility

  • MySQL Compatibility: Maintains compatibility with existing MySQL syntax
  • Doris Standards: Follows official Apache Doris documentation and syntax
  • Backward Compatibility: No breaking changes to existing functionality

Documentation

The implementation follows the official Apache Doris documentation:

Files Changed

  • src/sqlfluff/dialects/dialect_doris.py - Main dialect implementation
  • test/fixtures/dialects/doris/ - Comprehensive test suite (30+ test files)

Pull Request checklist

  • Please confirm you have completed any of the necessary steps below.

  • Included test cases to demonstrate any code changes, which may be one or more of the following:

    • .yml rule test cases in test/fixtures/rules/std_rule_cases.
    • .sql/.yml parser test cases in test/fixtures/dialects (note YML files can be auto generated with tox -e generate-fixture-yml).
    • Full autofix test cases in test/fixtures/linter/autofix.
    • Other.
  • Added appropriate documentation for the change.

  • Created GitHub issues for any relevant followup/future enhancements if appropriate.

This PR adds comprehensive support for the Apache Doris SQL dialect to SQLFluff, extending the existing MySQL grammar with Doris-specific syntax features. The implementation includes support for CREATE TABLE, DROP TABLE, INSERT statements, and various Doris-specific table properties and clauses.

- **Engine Types**: Support for Doris-specific engines including `olap`, `mysql`, `elasticsearch`, `hive`, `hudi`, `iceberg`, `jdbc`, `broker`
- **Key Types**: Support for `DUPLICATE KEY`, `AGGREGATE KEY`, and `UNIQUE KEY` syntax
- **Partitioning**:
  - Auto partitioning with `AUTO PARTITION BY RANGE(function) ()`
  - Manual range partitioning with `PARTITION BY RANGE(columns)`
  - List partitioning with `PARTITION BY LIST(columns)`
- **Distribution**: Support for `DISTRIBUTED BY HASH(columns)` and `DISTRIBUTED BY RANDOM`
- **Rollup Definitions**: Support for `ROLLUP` clauses
- **Table Properties**: Support for `PROPERTIES` clauses with key-value pairs
- **Index Definitions**: Support for Doris-specific index types (`INVERTED`, `BITMAP`, `BLOOM_FILTER`)
- **Generated Columns**: Support for `GENERATED ALWAYS AS` syntax
- **CREATE TABLE ... AS SELECT (CTAS)**: Full support for CTAS with optional properties
- **CREATE TABLE ... LIKE**: Support for creating tables based on existing table structure
- **External Tables**: Support for `CREATE EXTERNAL TABLE` and `CREATE TEMPORARY EXTERNAL TABLE`

- Support for `DROP TABLE [IF EXISTS] [db_name.]table_name [FORCE]` syntax
- Database-qualified table names
- Optional `IF EXISTS` clause
- Optional `FORCE` keyword for unrecoverable table deletion

- Basic `INSERT INTO table VALUES (...)` syntax
- Column specification with `INSERT INTO table (col1, col2) VALUES (...)`
- `DEFAULT` value support in VALUES clauses
- Multiple row insertion with comma-separated VALUES
- `INSERT ... SELECT` statements
- **Partition specification** with `PARTITION (p1, p2)` clause
- **Label specification** with `WITH LABEL label1` clause
- Complex combinations of all INSERT features

- Added Doris-specific reserved and unreserved keywords
- Support for Doris aggregation functions (`MAX`, `MIN`, `REPLACE`, `SUM`, `BITMAP_UNION`, `HLL_UNION`, `QUANTILE_UNION`)
- Support for `STORED` and `VIRTUAL` keywords in generated columns
- Proper handling of Doris-specific data types and constraints

- **Base Dialect**: Extends MySQL dialect to leverage existing MySQL grammar compatibility
- **Custom Segments**: Implements Doris-specific grammar segments for complex syntax
- **Keyword Management**: Properly manages Doris-specific keywords without conflicts
- **Grammar Extensions**: Adds Doris-specific grammar rules while maintaining compatibility

Comprehensive test coverage includes:
- **CREATE TABLE tests**: 15+ test files covering various table creation scenarios
- **DROP TABLE tests**: 4 test files covering different drop scenarios
- **INSERT tests**: 9 test files covering various insert patterns
- **Hive integration tests**: Multiple test files for Hive catalog integration
- **Complex syntax tests**: Edge cases and combinations of multiple features

- **MySQL Compatibility**: Maintains compatibility with existing MySQL syntax
- **Doris Standards**: Follows official Apache Doris documentation and syntax
- **Backward Compatibility**: No breaking changes to existing functionality

The implementation follows the official Apache Doris documentation:
- [CREATE TABLE](https://doris.apache.org/docs/dev/sql-manual/sql-statements/table-and-view/table/CREATE-TABLE)
- [DROP TABLE](https://doris.apache.org/docs/dev/sql-manual/sql-statements/table-and-view/table/DROP-TABLE)
- [INSERT](https://doris.apache.org/docs/dev/sql-manual/sql-statements/data-modification/DML/INSERT)

- `src/sqlfluff/dialects/dialect_doris.py` - Main dialect implementation
- `test/fixtures/dialects/doris/` - Comprehensive test suite (30+ test files)

This PR provides complete Apache Doris SQL dialect support, enabling SQLFluff to properly parse, lint, and format Doris SQL code while maintaining compatibility with existing MySQL-based workflows.
Copy link
Contributor

@keraion keraion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken an initial pass through this and it seems solid so far. The linter checks will fail on the missing newlines and there appears to be an empty test sql file. There are some opportunities for simplifying some of the classes down, but overall, very nice work. I'll take a second pass when I get the chance.

@morningman
Copy link
Contributor Author

Hi @keraion , thanks for your review. I fixed all issues. PTAL

@keraion
Copy link
Contributor

keraion commented Jun 30, 2025

Would you mind running the pre-commit hooks on these files? If using tox tox -e pre-commit -- run --all or without pre-commit run --all

@morningman
Copy link
Contributor Author

Would you mind running the pre-commit hooks on these files? If using tox tox -e pre-commit -- run --all or without pre-commit run --all

So I need to run this command in my local env? What is it about to do?

@keraion
Copy link
Contributor

keraion commented Jul 2, 2025

Yes, this will handle a few of the linting and formatting issues that remain. We have guides on getting tox setup, how to run the hooks automatically when committing, and a last checks guide that very briefly goes over it, but this command will run all the pre-commit hooks that we have in the .pre-commit-config.yaml.

Namely this will run black, ruff, and a few other helpers to clean up. These should be able to autofix most issues, but there may be a few items that need to be manually addressed.

LMK if you have any other questions!

@morningman
Copy link
Contributor Author

Hi @keraion , I ran the tox and reformat the code, now it looks good:

tox -e pre-commit -- run --all
pre-commit: commands[0]> pre-commit run --all
don't commit to branch...................................................Passed
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
black....................................................................Passed
mypy.....................................................................Passed
flake8...................................................................Passed
doc8.....................................................................Passed
yamllint.................................................................Passed
ruff.....................................................................Passed
codespell................................................................Passed
  pre-commit: OK (131.68=setup[0.10]+cmd[131.57] seconds)
  congratulations :) (132.00 seconds)

Copy link
Contributor

github-actions bot commented Jul 4, 2025

Coverage Results ✅

Name    Stmts   Miss  Cover   Missing
-------------------------------------
TOTAL   19835      0   100%

251 files skipped due to complete coverage.

Copy link
Contributor

@keraion keraion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! Thanks so much for the contribution. 🎉

@keraion keraion added this pull request to the merge queue Jul 4, 2025
Merged via the queue into sqlfluff:main with commit 63091b1 Jul 4, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Apache Doris Dialect Support
2 participants