Skip to content

Unparsable error when using timestamp precision with Athena dialect #6330

@twiseman-rn

Description

@twiseman-rn

Search before asking

  • I searched the issues and found no similar issues.

What Happened

We have a dbt project using sqlfluff. We are using the dbt-athena community adapter and the sqlfluff dbt templater.

We have a macro to get the current timestamp with a precision of 6, as iceberg table format requires 6 ms precision, but athena's default time precision is 3.

{% macro get_current_timestamp() -%}
    CAST(CURRENT_TIMESTAMP AS TIMESTAMP(6) WITH TIME ZONE)
{%- endmacro %}

When this macro is used, or when the snippet is used directly in a model without the macro, it results in both LT02 rule violation and PRS unparsable error, despite being perfectly valid SQL to athena.

Expected Behaviour

casting to timestamp with a precision should be parsable

Observed Behaviour

Macro was being used in a model with 400+ columns. But can be recreated with the simple model below, which is a slight modification to the default model given with a dbt init.

{{ config(materialized='table') }}

WITH source_data AS (

    SELECT 1 AS id

)

SELECT
    {{ get_current_timestamp() }} AS _log_time
FROM source_data

sqlfluff lint models/example/my_first_dbt_model.sql results in an LT02 rule violation and a PRS error.

image

This also occurs when using the same code from the macro directly in the query. I.e., if the model has

{{ config(materialized='table') }}

WITH source_data AS (

    SELECT 1 AS id

)

SELECT
    CAST(CURRENT_TIMESTAMP AS TIMESTAMP(6) WITH TIME ZONE) AS _log_time
FROM source_data

the same violations are given when running sqlfluff lint

Interestingly, it only seems to error on the first occurrence. I.e., If the model is altered to use both the code directly and the macro, which ever is used first causes the rule violations but the second does not.

E.g.

{{ config(materialized='table') }}

WITH source_data AS (

    SELECT 1 AS id

)

SELECT
    id,
    CAST(CURRENT_TIMESTAMP AS TIMESTAMP(6) WITH TIME ZONE) AS _log_time,
    'stuff' AS stuff,
     {{ get_current_timestamp() }} AS other_log_time
FROM source_data

results in

image

references the 2nd field in the select but not the fourth.

Which remains true even if the LT02 and PRS rules are inline ignored on the first violating line.

e.g., changing model to

{{ config(materialized='table') }}

WITH source_data AS (

    SELECT 1 AS id

)

SELECT
    id,
    CAST(CURRENT_TIMESTAMP AS TIMESTAMP(6) WITH TIME ZONE) AS _log_time, --noqa: LT02, PRS
    'stuff' AS stuff,
     {{ get_current_timestamp() }} AS other_log_time
FROM source_data

results in a pass, despite {{ get_current_timestamp() }} AS other_log_time causing PRS and LT02 errors if used on its own.

I also tried recreating it with other CAST() functions, including casting to something that passes parentheses like CAST(0.0, AS DECIMAL(14, 2)). But the issue seems specific to casting as TIMESTAMP(precision). It also doesn't seem specific to using the built in athena CURRENT_TIMESTAMP function, as passing fields from our data or string literals cause the same issues. I.e, these example cause the same PRS error and LT02 rule violation

CAST('2024-10-09 18:49' AS TIMESTAMP(6) WITH TIME ZONE)
CAST(CURRENT_TIMESTAMP AS TIMESTAMP(6) WITH TIME ZONE)

while these do not

CAST(0.0, AS DECIMAL(14, 2))
CAST('2024-10-09 18:49' AS TIMESTAMP WITH TIME ZONE)
CAST('2024-10-09 18:49' AS TIMESTAMP)

furthermore, different errant behavior is observed when using TIMESTAMP(precision), rather than TIMESTAMP(precision) WITH TIME ZONE

CAST(CURRENT_TIMESTAMP AS TIMESTAMP(6)) results in in LT01 violation. Adding the expected white space
CAST(CURRENT_TIMESTAMP AS TIMESTAMP (6)) passes without issue. But CAST(CURRENT_TIMESTAMP AS TIMESTAMP (6) WITH TIME ZONE) results in an LT09, LT02, and PRS violations.

How to reproduce

create a dbt project with dbt init using the dbt athena community connector.

edit the my_first_dbt_model.sql file to have the following:

{{ config(materialized='table') }}

WITH source_data AS (

    SELECT 1 AS id

)

SELECT
    CAST(CURRENT_TIMESTAMP AS TIMESTAMP(6) WITH TIME ZONE) AS _log_time
FROM source_data

and execute sqlfluff lint models/example/my_first_dbt_model.sql

Dialect

athena

Version

image
image
sqlfluff-templater-dbt==3.2.3

Configuration

[sqlfluff]
templater = dbt
dialect = athena
max_line_length = 120
large_file_skip_byte_limit = 0
exclude_rules = structure.column_order, references.qualification

# change according to your structure
[sqlfluff:templater:dbt]
project_dir = ./sqlfluff_test
profiles_dir = .
profile = sqlfluff-test
target = uat

[sqlfluff:rules:capitalisation.keywords]
capitalisation_policy = upper

[sqlfluff:rules:capitalisation.literals]
capitalisation_policy = upper

[sqlfluff:rules:capitalisation.types]
capitalisation_policy = upper

[sqlfluff:layout:type:alias_expression]
spacing_before = align
align_within = select_clause
align_scope = bracketed

Are you willing to work on and submit a PR to address the issue?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    athenaIssues relating to the Athena dialectbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions