Skip to content

structure.subquery rule breaks CTEs #5265

@llamasoft

Description

@llamasoft

Search before asking

  • I searched the issues and found no similar issues.

What Happened

Attempts to fix structure.subquery rule violations can result in invalid CTEs.

The specific trigger criteria appear to be:

  1. Have two or more CTEs whose queries violate structure.subquery by joining to subqueries.
  2. Those subqueries must have the same alias.
  3. One of the CTEs must contain multiple subqueries.
  4. The CTE with multiple violations must have the similarly named subquery last in the join.

Expected Behaviour

Partial fixes to structure.subquery should always place the extracted subquery before the expression that uses it.

Observed Behaviour

The extracted subquery appears after the query that references it which makes the query invalid.

Worse yet, if layout.cte_newline is enforced then multiple extracted subqueries will be incorrect and some subqueries will be deleted.

How to reproduce

Save the following SQL to a file named cte.sql:

WITH
cte1 AS (
    SELECT COUNT(*) AS qty
    FROM some_table AS st
    LEFT JOIN (
        SELECT 'first' AS id
    ) AS oops
    ON st.id = oops.id
),
cte2 AS (
    SELECT COUNT(*) AS other_qty
    FROM other_table AS sot
    LEFT JOIN (
        SELECT 'middle' AS id
    ) AS another
    ON sot.id = another.id
    LEFT JOIN (
        SELECT 'last' AS id
    ) AS oops
    ON sot.id = oops.id
)
SELECT CURRENT_DATE();

Without layout.cte_newline Enabled

Run sqlfluff fix --ignore-local-config --dialect=ansi --exclude-rules=layout.cte_newline -f - < cte.sql:

WITH cte1 AS (
    SELECT COUNT(*) AS qty
    FROM some_table AS st
    LEFT JOIN oops
        ON st.id = oops.id
),
oops AS (
    SELECT 'first' AS id
),
another AS (
    SELECT 'middle' AS id
),
cte2 AS (
    SELECT COUNT(*) AS other_qty
    FROM other_table AS sot
    LEFT JOIN another
        ON sot.id = another.id
    LEFT JOIN (
        SELECT 'last' AS id
    ) AS oops
        ON sot.id = oops.id
)
SELECT CURRENT_DATE();
  • ❌ The oops subquery from cte1 has been extracted but the definition doesn't appear until afterwards.
  • ✅ The another subquery from cte2 has been extracted and appears in the correct location.
  • ✅ The oops subquery from cte2 can't be extracted because a CTE with that name already exists.

With layout.cte_newline Enabled

Run sqlfluff fix --ignore-local-config --dialect=ansi -f - < cte.sql:

WITH cte1 AS (
    SELECT COUNT(*) AS qty
    FROM some_table AS st
    LEFT JOIN oops
        ON st.id = oops.id
),

oops AS (
    SELECT 'last' AS id
),

cte2 AS (
    SELECT COUNT(*) AS other_qty
    FROM other_table AS sot
    LEFT JOIN another
        ON sot.id = another.id
    LEFT JOIN oops
        ON sot.id = oops.id
),

another AS (
    SELECT 'middle' AS id
)
SELECT CURRENT_DATE();
  • ❌ The oops subquery from cte1 has been extracted but the definition doesn't appear until afterwards.
  • ❌❌❌ The oops query has the contents from cte2. The contents of the oops subquery from cte1 have been deleted entirely.
  • ❌ The another subquery from cte2 has been extracted but the definition doesn't appear until afterwards.

Dialect

The example uses ANSI but it appears to affect all dialects.

Version

sqlfluff, version 2.3.2
Python 3.10.10

Configuration

Default (example uses --ignore-local-config)

Are you willing to work on and submit a PR to address the issue?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions