Skip to content

Databricks Notebook Parsing: Parser fails on leading Markdown cell  #6418

@fstg1992

Description

@fstg1992

Search before asking

  • I searched the issues and found no similar issues.

What Happened

SQL Fluff won't parse a Databricks SQL Notebook that starts with a Markdown Magic cell (interpreted as comments), when exported from databricks / committed to a git repo. The default format of a leading markdown cell adds an empty line before the first actual cell.

grafik

grafik

Edit:
Just tested: for any type of leading cells, this behavior persists. -> Any non-SQL cell as first cell will produce this issue. Format for R, Python, Shell Cells all are saved with a trailing empty line before the Command-Terminator.

Expected Behaviour

Notebooks gets parsed and linted.

Observed Behaviour

SQL Fluff fails on parsing and refuses to parse the rest of the file.
grafik
grafik

when removing the empty line before the first -- COMMAND -------- , the file gets parsed no problem.
adding any keyword, statement, etc also results in a succesfully parsed file, although any contents will then be rendered as part of the mark down cell and not interpreted / executed later. So basically this is an invalid notebook

How to reproduce

Parsable, but invalid SQL Notebook.

-- Databricks notebook source
-- MAGIC %md
-- MAGIC # MarkDown Cell
-- COMMAND ----------

-- MAGIC %md
-- MAGIC # Test

-- COMMAND ----------

-- DBTITLE 1,Create Widget
CREATE  WIDGET TEXT     base_ts_txt       DEFAULT "2024-05-29 18:00:00";

Not parsable:

-- Databricks notebook source
-- MAGIC %md
-- MAGIC # MarkDown Cell

-- COMMAND ----------

-- MAGIC %md
-- MAGIC # Test

-- COMMAND ----------

-- DBTITLE 1,Create Widget
CREATE  WIDGET TEXT     base_ts_txt       DEFAULT "2024-05-29 18:00:00";

Dialect

Databricks

Version

3.2.5

Configuration

[tool.sqlfluff.core]
dialect = "databricks"
templater = "jinja"
sql_file_exts = ".sql"
verbose = 2
exclude_rules = ["RF04", "ST05", "ST06","ST09", "LT05", "AL01", "ST01", "RF02", "RF03","AL09"]
ignore_files = []

[tool.sqlfluff.indentation]
indent_unit = "space"
tab_space_size = 4
allow_implicit_indents = "True"

[tool.sqlfluff.layout.type.groupby_clause]
line_position = "alone:strict"

[tool.sqlfluff.layout.type.statement_terminator]
spacing_before = "touch"
line_position = "trailing"

[tool.sqlfluff.layout.type.comma]
spacing_before = "touch"
line_position = "leading"

[tool.sqlfluff.layout.type.end_of_file]
spacing_before = "touch"

[tool.sqlfluff.rules.capitalisation.keywords]
capitalisation_policy = "upper"

[tool.sqlfluff.rules.convention.quoted_literals]
preferred_quoted_literal_style = "double_quotes"

[tool.sqlfluff.templater.python.context]
catalog = "xxx"
current_catalog = "xxx"
target_schema = "xxx"
source_schema = "xxx"
pos_ts = "2024-01-01 00:00:00"
pfc_ts = "2024-01-01 00:00:00"
delivery_begin_incl = "2024-01-01 00:00:00"
delivery_end_excl = "2024-01-01 00:00:00"
future_begin_incl = "2024-01-01 00:00:00"
base_ts_txt = "2024-01-01 00:00:00"
pid = ""
proc_id = ""

Are you willing to work on and submit a PR to address the issue?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions