Skip to content

Conversation

morgo
Copy link
Contributor

@morgo morgo commented Mar 14, 2023

Thank you for your interest. Please refer to the Contributing link in the sidebar for guidance. Be sure to reference an existing issue number in your pull request description.

Fixes #120

I took the liberty of making it a warning. It seems useful to me.

@evanelias
Copy link
Contributor

Thanks for the PR! I'll aim to review later this week.

The most recent test failure is just from the Dockerized db being flaky / dropping conns or timing out. Seems to be happening at random on GitHub Actions the past few weeks, although I can't ever repro it locally 🤷 I just kicked off a re-run, usually it works fine the second time. If not, I'll try again at a later hour.

@coveralls
Copy link

coveralls commented Mar 14, 2023

Coverage Status

Coverage: 93.282% (+0.02%) from 93.263% when pulling da593a2 on morgo:lint-future-reserved-words into 94eaa99 on skeema:main.

@evanelias
Copy link
Contributor

Thanks again for the PR!

One tricky wrinkle here is that whenever Skeema dumps a CREATE TABLE statement to the filesystem (skeema init, skeema pull, skeema format, etc), it uses the canonical format from SHOW CREATE TABLE, which means all table names and column names will be backtick-quoted. And some tables/columns may be using names that are already reserved words (for the current db version), and that's legal SQL due to the backtick-quoting.

So as a result of that, I believe this linter implementation would flag those too right? i.e. it's not strictly finding future reserved words. If so, I could see a couple paths forward:

(a) Keep the implementation as-is, but change the linter annotation wording, to not be about "later" releases / "forward-compatibility" etc

or

(b) Keep the focus on future reserved words by tracking the version which introduces each reserved word, and only flag ones which aren't already a reserved word in the user's current DB version

There's almost certainly other approaches I haven't considered, so also totally open to discussing alternatives.

Of these two, (a) is simpler but potentially noisier. In my experience, whenever a company had the misfortune of picking a reserved word for a table or column name long ago, the engineers tend to already be acutely aware of it -- from the constant need to use backticks around the name in ad-hoc SQL queries. So having Skeema complain about it further might just be frustrating. They could disable the linter rule to avoid that, but then they lose the benefit of future-proofing new tables and columns.

Similarly, regarding MySQL-specific vs MariaDB-specific reserved words: Some companies try to remain cross-compatible, but others likely don't ever plan to migrate between the two, so flagging the reserved words from the other DBMS may be viewed as noisy.

(b) feels more useful, but is trickier to implement. Because both MySQL version X and MariaDB version Y can hypothetically introduce the same reserved word, the data structure for tracking versions would likely need to be map[string][]tengo.Flavor -- that is, a mapping from (lowercased reserved word) to (slice of flavors that introduce this reserved word). For example, "except": []{tengo.FlavorMySQL80, tengo.FlavorMariaDB103}. One benefit though is that the list of reserved words to track is much smaller: anything appearing on both MySQL 5.5 and MariaDB 10.1's reserved word lists can be omitted entirely, since Skeema doesn't support versions older than that, so there's never a case where those ones are "future" reserved words.

If going with (b), the last arg to the checker function will be useful -- it's a linter.Options struct value, which contains a Flavor field indicating what dbms + version the user is actually running. The tengo.Flavor.Min() method can be used for comparisons.

I'll likely have some additional review comments later, depending a bit on which approach is taken.

@morgo
Copy link
Contributor Author

morgo commented Mar 20, 2023

I prefer (a) and renaming the error message. It is good practice even in current versions not to use reserved words. I think I have seen bugs from internal SQL (or tool SQL) statements tripping on reserved words, so even if the user's ORM supports it - if they had upfront knowledge they may not have chosen this path.

I see your point about being 'nagging' because renames are typically difficult though :(

I am not sure how practical it is to implement, but #160 seems like a better implementation of silencing nags.

@evanelias
Copy link
Contributor

OK, sounds good. I might tweak it a bit post-merge to apply to names of other object types, but still tbd. I'll leave some review comments momentarily.

Re: #160, yeah agreed that would be ideal for silencing nags in the future, but in the near-term it just isn't practical to prioritize relative to some other more pressing feature requests.

@evanelias evanelias self-requested a review March 20, 2023 21:21
@morgo morgo requested a review from evanelias March 21, 2023 15:08
@evanelias evanelias merged commit c914202 into skeema:main Mar 24, 2023
evanelias added a commit that referenced this pull request Mar 24, 2023
This commit enhances lint-reserved-word (from #207) in several ways:

* Non-table objects, such as stored procedures and functions, now have their
  names checked for reserved words as well.

* Names will only be flagged by the linter if they match a reserved word in
  some version of your same DBMS vendor (MySQL vs MariaDB). That is, if you're
  running MySQL then only MySQL reserved words are considered; ditto for
  MariaDB.

  If you're migrating from one vendor to another, you can still perform
  cross-vendor linting by using workspace=docker and overriding the flavor
  on the command-line. For example, ahead of a MariaDB-to-MySQL8 migration,
  this command-line will lint using a MySQL 8 container -- switching the
  reserved word list flagging to use MySQL's list instead of MariaDB's:

  skeema lint --workspace=docker --docker-cleanup=destroy --flavor=mysql:8.0

* The linter annotation message now differs for current reserved words
  (already reserved in your DB version, or in a previous version) vs future
  reserved words (not reserved yet in your version, but will be if you
  upgrade).

* Internally, the reserved word list is now tracked in the tengo subpackage,
  instead of the linter subpackage. This will likely be expanded in the
  future to also track regular non-reserved keywords, for purposes of
  improving the parser code (reducing ad hoc regex usage throughout the
  codebase) and also solving issues like #175 and #199.
@evanelias
Copy link
Contributor

Just committed 4c2ddd0 on top, which reworked this a fair bit -- see that commit message for reasoning. To be clear, there was nothing wrong with your original implementation here, it was great and the PR is hugely appreciated 👍 It just made sense to make those changes on top, as they're in alignment with some other upcoming work.

@morgo morgo deleted the lint-future-reserved-words branch May 12, 2023 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add linter for reserved words in upcoming MySQL versions
3 participants