Skip to content

Handling of PARTITION_TYPE in CREATE EXTERNAL TABLE parameters in Snowflake dialect  #6420

@ninazacharia-toast

Description

@ninazacharia-toast

Search before asking

  • I searched the issues and found no similar issues.

What Happened

The following query triggers a parsing error:

CREATE EXTERNAL TABLE IF NOT EXISTS source_test.test (
    yyyymmdd TEXT AS (PARSE_JSON(metadata$external_table_partition):YYYYMMDD::TEXT),
    product TEXT AS (value:product::TEXT)
)
PARTITION BY (yyyymmdd)
PARTITION_TYPE = user_specified
LOCATION = @public.test_stage
FILE_FORMAT = public.parquet_format_convert_binary
AUTO_REFRESH = false;

If i remove the PARTITION_TYPE condition, there is no more error:

CREATE EXTERNAL TABLE IF NOT EXISTS source_test.test (
    yyyymmdd TEXT AS (PARSE_JSON(metadata$external_table_partition):YYYYMMDD::TEXT),
    product TEXT AS (value:product::TEXT)
)
PARTITION BY (yyyymmdd)
LOCATION = @public.test_stage
FILE_FORMAT = public.parquet_format_convert_binary
AUTO_REFRESH = false;

Expected Behaviour

Linting should not reveal any errors

Observed Behaviour

Here is the output of sqlfluff parse for the unparseable code:

[L:  1, P:  1]      |file:
[L:  1, P:  1]      |    statement:
[L:  1, P:  1]      |        create_external_table_statement:
[L:  1, P:  1]      |            keyword:                                          'CREATE'
[L:  1, P:  7]      |            whitespace:                                       ' '
[L:  1, P:  8]      |            keyword:                                          'EXTERNAL'
[L:  1, P: 16]      |            whitespace:                                       ' '
[L:  1, P: 17]      |            keyword:                                          'TABLE'
[L:  1, P: 22]      |            whitespace:                                       ' '
[L:  1, P: 23]      |            keyword:                                          'IF'
[L:  1, P: 25]      |            whitespace:                                       ' '
[L:  1, P: 26]      |            keyword:                                          'NOT'
[L:  1, P: 29]      |            whitespace:                                       ' '
[L:  1, P: 30]      |            keyword:                                          'EXISTS'
[L:  1, P: 36]      |            whitespace:                                       ' '
[L:  1, P: 37]      |            table_reference:
[L:  1, P: 37]      |                naked_identifier:                             'source_test'
[L:  1, P: 48]      |                dot:                                          '.'
[L:  1, P: 49]      |                naked_identifier:                             'test'
[L:  1, P: 53]      |            whitespace:                                       ' '
[L:  1, P: 54]      |            bracketed:
[L:  1, P: 54]      |                start_bracket:                                '('
[L:  1, P: 55]      |                [META] indent:
[L:  1, P: 55]      |                newline:                                      '\n'
[L:  2, P:  1]      |                whitespace:                                   '    '
[L:  2, P:  5]      |                naked_identifier:                             'yyyymmdd'
[L:  2, P: 13]      |                whitespace:                                   ' '
[L:  2, P: 14]      |                data_type:
[L:  2, P: 14]      |                    data_type_identifier:                     'TEXT'
[L:  2, P: 18]      |                whitespace:                                   ' '
[L:  2, P: 19]      |                keyword:                                      'AS'
[L:  2, P: 21]      |                whitespace:                                   ' '
[L:  2, P: 22]      |                bracketed:
[L:  2, P: 22]      |                    start_bracket:                            '('
[L:  2, P: 23]      |                    [META] indent:
[L:  2, P: 23]      |                    expression:
[L:  2, P: 23]      |                        cast_expression:
[L:  2, P: 23]      |                            function:
[L:  2, P: 23]      |                                function_name:
[L:  2, P: 23]      |                                    function_name_identifier:  'PARSE_JSON'
[L:  2, P: 33]      |                                function_contents:
[L:  2, P: 33]      |                                    bracketed:
[L:  2, P: 33]      |                                        start_bracket:        '('
[L:  2, P: 34]      |                                        [META] indent:
[L:  2, P: 34]      |                                        expression:
[L:  2, P: 34]      |                                            column_reference:
[L:  2, P: 34]      |                                                naked_identifier:  'metadata$external_table_partition'
[L:  2, P: 67]      |                                        [META] dedent:
[L:  2, P: 67]      |                                        end_bracket:          ')'
[L:  2, P: 68]      |                            semi_structured_expression:
[L:  2, P: 68]      |                                colon:                        ':'
[L:  2, P: 69]      |                                semi_structured_element:      'YYYYMMDD'
[L:  2, P: 77]      |                            casting_operator:                 '::'
[L:  2, P: 79]      |                            data_type:
[L:  2, P: 79]      |                                data_type_identifier:         'TEXT'
[L:  2, P: 83]      |                    [META] dedent:
[L:  2, P: 83]      |                    end_bracket:                              ')'
[L:  2, P: 84]      |                comma:                                        ','
[L:  2, P: 85]      |                newline:                                      '\n'
[L:  3, P:  1]      |                whitespace:                                   '    '
[L:  3, P:  5]      |                naked_identifier:                             'product'
[L:  3, P: 12]      |                whitespace:                                   ' '
[L:  3, P: 13]      |                data_type:
[L:  3, P: 13]      |                    data_type_identifier:                     'TEXT'
[L:  3, P: 17]      |                whitespace:                                   ' '
[L:  3, P: 18]      |                keyword:                                      'AS'
[L:  3, P: 20]      |                whitespace:                                   ' '
[L:  3, P: 21]      |                bracketed:
[L:  3, P: 21]      |                    start_bracket:                            '('
[L:  3, P: 22]      |                    [META] indent:
[L:  3, P: 22]      |                    expression:
[L:  3, P: 22]      |                        cast_expression:
[L:  3, P: 22]      |                            column_reference:
[L:  3, P: 22]      |                                naked_identifier:             'value'
[L:  3, P: 27]      |                            semi_structured_expression:
[L:  3, P: 27]      |                                colon:                        ':'
[L:  3, P: 28]      |                                semi_structured_element:      'product'
[L:  3, P: 35]      |                            casting_operator:                 '::'
[L:  3, P: 37]      |                            data_type:
[L:  3, P: 37]      |                                data_type_identifier:         'TEXT'
[L:  3, P: 41]      |                    [META] dedent:
[L:  3, P: 41]      |                    end_bracket:                              ')'
[L:  3, P: 42]      |                newline:                                      '\n'
[L:  4, P:  1]      |                [META] dedent:
[L:  4, P:  1]      |                end_bracket:                                  ')'
[L:  4, P:  2]      |            newline:                                          '\n'
[L:  5, P:  1]      |            keyword:                                          'PARTITION'
[L:  5, P: 10]      |            whitespace:                                       ' '
[L:  5, P: 11]      |            keyword:                                          'BY'
[L:  5, P: 13]      |            whitespace:                                       ' '
[L:  5, P: 14]      |            bracketed:
[L:  5, P: 14]      |                start_bracket:                                '('
[L:  5, P: 15]      |                [META] indent:
[L:  5, P: 15]      |                naked_identifier:                             'yyyymmdd'
[L:  5, P: 23]      |                [META] dedent:
[L:  5, P: 23]      |                end_bracket:                                  ')'
[L:  5, P: 24]      |    newline:                                                  '\n'
[L:  6, P:  1]      |    unparsable:                                               !! Expected: 'Nothing else in FileSegment.'
[L:  6, P:  1]      |        word:                                                 'PARTITION_TYPE'
[L:  6, P: 15]      |        whitespace:                                           ' '
[L:  6, P: 16]      |        equals:                                               '='
[L:  6, P: 17]      |        whitespace:                                           ' '
[L:  6, P: 18]      |        word:                                                 'user_specified'
[L:  6, P: 32]      |        newline:                                              '\n'
[L:  7, P:  1]      |        word:                                                 'LOCATION'
[L:  7, P:  9]      |        whitespace:                                           ' '
[L:  7, P: 10]      |        equals:                                               '='
[L:  7, P: 11]      |        whitespace:                                           ' '
[L:  7, P: 12]      |        stage_path:                                           '@public.test_stage'
[L:  7, P: 30]      |        newline:                                              '\n'
[L:  8, P:  1]      |        word:                                                 'FILE_FORMAT'
[L:  8, P: 12]      |        whitespace:                                           ' '
[L:  8, P: 13]      |        equals:                                               '='
[L:  8, P: 14]      |        whitespace:                                           ' '
[L:  8, P: 15]      |        word:                                                 'public'
[L:  8, P: 21]      |        dot:                                                  '.'
[L:  8, P: 22]      |        word:                                                 'parquet_format_convert_binary'
[L:  8, P: 51]      |        newline:                                              '\n'
[L:  9, P:  1]      |        word:                                                 'AUTO_REFRESH'
[L:  9, P: 13]      |        whitespace:                                           ' '
[L:  9, P: 14]      |        equals:                                               '='
[L:  9, P: 15]      |        whitespace:                                           ' '
[L:  9, P: 16]      |        word:                                                 'false'
[L:  9, P: 21]      |        semicolon:                                            ';'
[L:  9, P: 22]      |    [META] end_of_file:

==== parsing violations ====
L:   6 | P:   1 |  PRS | Line 6, Position 1: Found unparsable section: 'PARTITION_TYPE =
                       | user_specified\nLOCATION...'
WARNING: Parsing errors found and dialect is set to 'snowflake'. Have you configured your dialect correctly?

Here is the output of sqlfluff parse for the parseable code:

[L:  1, P:  1]      |file:
[L:  1, P:  1]      |    statement:
[L:  1, P:  1]      |        create_external_table_statement:
[L:  1, P:  1]      |            keyword:                                          'CREATE'
[L:  1, P:  7]      |            whitespace:                                       ' '
[L:  1, P:  8]      |            keyword:                                          'EXTERNAL'
[L:  1, P: 16]      |            whitespace:                                       ' '
[L:  1, P: 17]      |            keyword:                                          'TABLE'
[L:  1, P: 22]      |            whitespace:                                       ' '
[L:  1, P: 23]      |            keyword:                                          'IF'
[L:  1, P: 25]      |            whitespace:                                       ' '
[L:  1, P: 26]      |            keyword:                                          'NOT'
[L:  1, P: 29]      |            whitespace:                                       ' '
[L:  1, P: 30]      |            keyword:                                          'EXISTS'
[L:  1, P: 36]      |            whitespace:                                       ' '
[L:  1, P: 37]      |            table_reference:
[L:  1, P: 37]      |                naked_identifier:                             'source_test'
[L:  1, P: 48]      |                dot:                                          '.'
[L:  1, P: 49]      |                naked_identifier:                             'test'
[L:  1, P: 53]      |            whitespace:                                       ' '
[L:  1, P: 54]      |            bracketed:
[L:  1, P: 54]      |                start_bracket:                                '('
[L:  1, P: 55]      |                [META] indent:
[L:  1, P: 55]      |                newline:                                      '\n'
[L:  2, P:  1]      |                whitespace:                                   '    '
[L:  2, P:  5]      |                naked_identifier:                             'yyyymmdd'
[L:  2, P: 13]      |                whitespace:                                   ' '
[L:  2, P: 14]      |                data_type:
[L:  2, P: 14]      |                    data_type_identifier:                     'TEXT'
[L:  2, P: 18]      |                whitespace:                                   ' '
[L:  2, P: 19]      |                keyword:                                      'AS'
[L:  2, P: 21]      |                whitespace:                                   ' '
[L:  2, P: 22]      |                bracketed:
[L:  2, P: 22]      |                    start_bracket:                            '('
[L:  2, P: 23]      |                    [META] indent:
[L:  2, P: 23]      |                    expression:
[L:  2, P: 23]      |                        cast_expression:
[L:  2, P: 23]      |                            function:
[L:  2, P: 23]      |                                function_name:
[L:  2, P: 23]      |                                    function_name_identifier:  'PARSE_JSON'
[L:  2, P: 33]      |                                function_contents:
[L:  2, P: 33]      |                                    bracketed:
[L:  2, P: 33]      |                                        start_bracket:        '('
[L:  2, P: 34]      |                                        [META] indent:
[L:  2, P: 34]      |                                        expression:
[L:  2, P: 34]      |                                            column_reference:
[L:  2, P: 34]      |                                                naked_identifier:  'metadata$external_table_partition'
[L:  2, P: 67]      |                                        [META] dedent:
[L:  2, P: 67]      |                                        end_bracket:          ')'
[L:  2, P: 68]      |                            semi_structured_expression:
[L:  2, P: 68]      |                                colon:                        ':'
[L:  2, P: 69]      |                                semi_structured_element:      'YYYYMMDD'
[L:  2, P: 77]      |                            casting_operator:                 '::'
[L:  2, P: 79]      |                            data_type:
[L:  2, P: 79]      |                                data_type_identifier:         'TEXT'
[L:  2, P: 83]      |                    [META] dedent:
[L:  2, P: 83]      |                    end_bracket:                              ')'
[L:  2, P: 84]      |                comma:                                        ','
[L:  2, P: 85]      |                newline:                                      '\n'
[L:  3, P:  1]      |                whitespace:                                   '    '
[L:  3, P:  5]      |                naked_identifier:                             'product'
[L:  3, P: 12]      |                whitespace:                                   ' '
[L:  3, P: 13]      |                data_type:
[L:  3, P: 13]      |                    data_type_identifier:                     'TEXT'
[L:  3, P: 17]      |                whitespace:                                   ' '
[L:  3, P: 18]      |                keyword:                                      'AS'
[L:  3, P: 20]      |                whitespace:                                   ' '
[L:  3, P: 21]      |                bracketed:
[L:  3, P: 21]      |                    start_bracket:                            '('
[L:  3, P: 22]      |                    [META] indent:
[L:  3, P: 22]      |                    expression:
[L:  3, P: 22]      |                        cast_expression:
[L:  3, P: 22]      |                            column_reference:
[L:  3, P: 22]      |                                naked_identifier:             'value'
[L:  3, P: 27]      |                            semi_structured_expression:
[L:  3, P: 27]      |                                colon:                        ':'
[L:  3, P: 28]      |                                semi_structured_element:      'product'
[L:  3, P: 35]      |                            casting_operator:                 '::'
[L:  3, P: 37]      |                            data_type:
[L:  3, P: 37]      |                                data_type_identifier:         'TEXT'
[L:  3, P: 41]      |                    [META] dedent:
[L:  3, P: 41]      |                    end_bracket:                              ')'
[L:  3, P: 42]      |                newline:                                      '\n'
[L:  4, P:  1]      |                [META] dedent:
[L:  4, P:  1]      |                end_bracket:                                  ')'
[L:  4, P:  2]      |            newline:                                          '\n'
[L:  5, P:  1]      |            keyword:                                          'PARTITION'
[L:  5, P: 10]      |            whitespace:                                       ' '
[L:  5, P: 11]      |            keyword:                                          'BY'
[L:  5, P: 13]      |            whitespace:                                       ' '
[L:  5, P: 14]      |            bracketed:
[L:  5, P: 14]      |                start_bracket:                                '('
[L:  5, P: 15]      |                [META] indent:
[L:  5, P: 15]      |                naked_identifier:                             'yyyymmdd'
[L:  5, P: 23]      |                [META] dedent:
[L:  5, P: 23]      |                end_bracket:                                  ')'
[L:  5, P: 24]      |            newline:                                          '\n'
[L:  6, P:  1]      |            keyword:                                          'LOCATION'
[L:  6, P:  9]      |            whitespace:                                       ' '
[L:  6, P: 10]      |            comparison_operator:
[L:  6, P: 10]      |                raw_comparison_operator:                      '='
[L:  6, P: 11]      |            whitespace:                                       ' '
[L:  6, P: 12]      |            stage_path:                                       '@public.test_stage'
[L:  6, P: 30]      |            newline:                                          '\n'
[L:  7, P:  1]      |            keyword:                                          'FILE_FORMAT'
[L:  7, P: 12]      |            whitespace:                                       ' '
[L:  7, P: 13]      |            comparison_operator:
[L:  7, P: 13]      |                raw_comparison_operator:                      '='
[L:  7, P: 14]      |            whitespace:                                       ' '
[L:  7, P: 15]      |            file_format_segment:
[L:  7, P: 15]      |                object_reference:
[L:  7, P: 15]      |                    naked_identifier:                         'public'
[L:  7, P: 21]      |                    dot:                                      '.'
[L:  7, P: 22]      |                    naked_identifier:                         'parquet_format_convert_binary'
[L:  7, P: 51]      |            newline:                                          '\n'
[L:  8, P:  1]      |            keyword:                                          'AUTO_REFRESH'
[L:  8, P: 13]      |            whitespace:                                       ' '
[L:  8, P: 14]      |            comparison_operator:
[L:  8, P: 14]      |                raw_comparison_operator:                      '='
[L:  8, P: 15]      |            whitespace:                                       ' '
[L:  8, P: 16]      |            boolean_literal:                                  'false'
[L:  8, P: 21]      |    statement_terminator:                                     ';'
[L:  8, P: 22]      |    [META] end_of_file:

How to reproduce

Run sqlfluff parse (or lint/fix) on the above code

Dialect

Snowflake

Version

sqlfluff, version 3.2.4

Configuration

[sqlfluff]
dialect = snowflake
templater = python
# Allow us to lint even very large files
large_file_skip_byte_limit = 0

# Our included rules
rules = AL01, AL02, AL03, AL04, AL05, AL07, AL08, AL09, AM01, AM02, AM05, AM06, CP01, CP02, CP03,
    CP04, CP05, CV03, CV05, JJ01, LT01, LT02, LT04, LT05, LT06, LT07, LT09, LT10, RF01, RF02, ST02, ST03
# Rules that we'd like to warn rather than error
warnings = RF01, RF02, LT02, LT05
max_line_length = 120

[sqlfluff:indentation]
indented_joins = True
allow_implicit_indents = True
indented_on_contents = False

[sqlfluff:rules:aliasing.forbid]
# Forbids aliases unless it is a self join
force_enable = True

[sqlfluff:rules:ambiguous.column_references]
# makes it so you can't group by/order by 1,2,3 etc.
group_by_and_order_by_style = explicit

[sqlfluff:rules:layout.select_targets]
# This enforces that the * be on a new line for a query such as "SELECT * FROM foo"
wildcard_policy = multiple

[sqlfluff:layout:type:comma]
# This does trailing vs. leading commas
line_position = trailing

Are you willing to work on and submit a PR to address the issue?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingsnowflakeIssues related to the Snowflake dialect

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions