Skip to content

Conversation

troshnev
Copy link
Contributor

@troshnev troshnev commented Sep 11, 2024

https://docs.vertica.com/24.3.x/en/sql-reference/language-elements/identifiers/ :
Unquoted SQL identifiers must begin with one of the following:

  • Non-Unicode letters: A–Z or a-z
    -- /actually Vertica accepts also non-ASCII UTF-8 Unicode characters here, which is not well documented/
  • Underscore (_)
    Subsequent characters in an identifier can be any combination of the following:
  • Non-Unicode letters: A–Z or a-z
  • Underscore (_)
  • Digits(0–9)
  • Unicode letters (letters with diacriticals or not in the Latin alphabet), unsupported for model names
  • Dollar sign ($), unsupported for model names

Vertica accepts non-ASCII UTF-8 Unicode characters for table names, column names, and other identifiers,
extending the cases where upper/lower case distinctions are ignored (case-folded) to all alphabets,
including Latin, Cyrillic, and Greek.

Changes to be committed:
modified: src/sqlfluff/dialects/dialect_vertica.py
new file: test/fixtures/dialects/vertica/utf8.sql

Brief summary of the change made

Are there any other side effects of this change that we should be aware of?

Pull Request checklist

  • Please confirm you have completed any of the necessary steps below.

  • Included test cases to demonstrate any code changes, which may be one or more of the following:

    • .yml rule test cases in test/fixtures/rules/std_rule_cases.
    • .sql/.yml parser test cases in test/fixtures/dialects (note YML files can be auto generated with tox -e generate-fixture-yml).
    • Full autofix test cases in test/fixtures/linter/autofix.
    • Other.
  • Added appropriate documentation for the change.

  • Created GitHub issues for any relevant followup/future enhancements if appropriate.

# https://docs.vertica.com/24.3.x/en/sql-reference/language-elements/identifiers/ :
# Unquoted SQL identifiers must begin with one of the following:
# * Non-Unicode letters: A–Z or a-z
# -- /actually Vertica accepts also non-ASCII UTF-8 Unicode characters here, which is not well documented/
# * Underscore (_)
# Subsequent characters in an identifier can be any combination of the following:
# * Non-Unicode letters: A–Z or a-z
# * Underscore (_)
# * Digits(0–9)
# * Unicode letters (letters with diacriticals or not in the Latin alphabet), unsupported for model names
# * Dollar sign ($), unsupported for model names
#
# Vertica accepts **non-ASCII UTF-8 Unicode characters** for table names, column names, and other identifiers,
# extending the cases where upper/lower case distinctions are ignored (case-folded) to all alphabets,
# including Latin, Cyrillic, and Greek.
#
# Changes to be committed:
#	modified:   src/sqlfluff/dialects/dialect_vertica.py
#	new file:   test/fixtures/dialects/vertica/utf8.sql
@troshnev
Copy link
Contributor Author

Fixed all known issues at the moment, ready for next review

@skryzh
Copy link
Contributor

skryzh commented Sep 18, 2024

@WittierDinosaur hi, is it possible to change my incorrect login in commit to correct one?
sergey.kry -> skryzh
It was a rookie mistake with local git config file...

@alanmcruickshank
Copy link
Member

@WittierDinosaur hi, is it possible to change my incorrect login in commit to correct one? sergey.kry -> skryzh It was a rookie mistake with local git config file...

I'll let @WittierDinosaur re-review this PR given he did the first pass, however on changing the commit username: @skryzh I think you can do that on your side. The branch for this PR is in the troshnev fork and not in the main repository, so you've still got full control if you want to change the commit username. It might be easiest to do this by doing a soft-reset on this branch (i.e. uncommitting your changes but keeping them) and then re-committing them using your correct username. You'd then need to force-push to the remote branch used for this PR. As the admin for your fork, you're best placed to make that change rather than us.

Copy link
Contributor

github-actions bot commented Sep 23, 2024

Coverage Results ✅

Name    Stmts   Miss  Cover   Missing
-------------------------------------
TOTAL   18362      0   100%

237 files skipped due to complete coverage.

@coveralls
Copy link

coveralls commented Sep 23, 2024

Coverage Status

coverage: 99.985%. remained the same
when pulling 839986b on troshnev:main
into d69a511 on sqlfluff:main.

cover cases where non-ascii word is parameter like ALTER TABLE some_table TO utf8_identifier_eg_Verkäufer;
[feature] adding new test cases for utf8
@troshnev troshnev marked this pull request as draft September 24, 2024 16:05
@troshnev troshnev marked this pull request as ready for review September 26, 2024 07:54
Copy link
Member

@alanmcruickshank alanmcruickshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very sensible. Thanks for contributing this and putting together the extensive test case 👍

@alanmcruickshank alanmcruickshank added this pull request to the merge queue Sep 27, 2024
Merged via the queue into sqlfluff:main with commit b75f511 Sep 27, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants