Skip to content

Conversation

johlo
Copy link
Contributor

@johlo johlo commented Jun 29, 2019

This PR adds 3 foreign-key related lint detectors:

  1. non-unique-fk-ref
    InnoDB allows FK references to non-unique keys in the parent table but it is advised against in the mysql reference manual.
    Personally I came across this as I had issues with code generation tools (from schema to code) that didn't handle our FK references to non-unique keys.
    A lint warning should be useful.
  2. fk-missing-parent-table
    InnoDB lets you create a FK which references a non-existent table (with FOREIGN_KEY_CHECKS=0), added a detector for that.
    It doesn't seem to allow a reference to an existing table but non-existent column though, so not checking that.
  3. duplicate-fk
    Detects if there are several FK definitions that are identical (except for their names).

Some notes:

  • All the detectors assume that InnoDB is used but doesn't check the engine explicitly. AFAIK InnoDB is the only engine that checks the
    FKs, but it could still be interesting to check that they are valid in the linter? e.g. if FKs are used for documentation purposes.
  • The function getCoveringKey(table *tengo.Table, fkColumns []*tengo.Column) *tengo.Index is useful for other code as well and should be moved to the tengo package as a method on tengo.Table.
  • I have only tested this with mysql 5.7.26.

johlo added 2 commits June 29, 2019 21:59
 1. `non-unique-fk-ref`
    InnoDB allows FK references to non-unique keys in the parent table but it is advised against in the mysql reference manual.
 2. `fk-missing-parent-table`
    InnoDB lets you create a FK which references a non-existent table (with FOREIGN_KEY_CHECKS=0), added a detector for that.
    It doesn't seem to allow a reference to an existing table but non-existent column though, so not checking that.
 3. `duplicate-fk`
    Detects if there are several FK definitions that are identical (except for their names).
@coveralls
Copy link

Coverage Status

Coverage decreased (-1.8%) to 89.02% when pulling 15f702c on johlo:non-unique-fk-detector into 7c93395 on skeema:master.

@evanelias
Copy link
Contributor

Thank you for the PR! The detector logic looks good at a high level. I'm going to need to keep this PR open for a few weeks though, while I land some other changes on my end:

  • Easier integration testing for linter problem detectors
  • Potentially some core linter interface changes for Skeema v1.3.x -- still TBD though
  • Switching to a newer version of Golang for skeema's Travis CI configuration (in case the current gofmt CI failure is due to a version-related change, among other motivations for upgrading)
  • Adding the covering-index logic to Tengo, which would simplify code here; also possibly can reuse/share/simplify some of the recent code in Tengo related to redundant indexes

Thanks again, and apologies in advance for the delay!

@johlo
Copy link
Contributor Author

johlo commented Jun 29, 2019

Sounds great!

As I mentioned before I've looked into other FK checks as well, e.g.

  • Don't allow cascade of NULL values if a column is declared NOT NULL
  • Not using InnoDB on both parent/child tables
  • Too long cascading chains (max 15 cascades in InnoDB IIRC)
    etc, I'm sure there are many subtle cases to look for.

Potentially some core linter interface changes for Skeema v1.3.x -- still TBD though

Feel free to ping me if you want some feedback!

in case the current gofmt CI failure is due to a version-related change

Most likely it is. I'm using go 1.12.6, the CI system uses 1.9.7. I think there have been a few tweaks to the formatting rules in between.
Would be good to move from gopkg to go modules as well :)

@evanelias
Copy link
Contributor

Would be good to move from gopkg to go modules as well :)

I definitely plan on giving modules a try soon. dep has been working fine though so it just hasn't been a high priority to switch yet.

As I mentioned before I've looked into other FK checks as well

I'm hesitant to have too many separate FK-related checks, since many MySQL deployments don't use FKs at all (including all of the largest companies), and each new detector entails configuration and documentation overhead. But perhaps some of these could be combined into a single detector, which sanity-checks FK references in general.

Along those lines, how would you feel about combining fk-missing-parent-table and non-unique-fk-ref into a single fk-ref detector? And then in the future if it makes sense to support some of the additional checks, these could be added to that same detector.

Part of my thinking is that users probably would want to enable all of these FK reference checks or none, so the granularity of separate detectors may be unnecessary, from a configuration complexity standpoint.

That said, keeping the duplicate-FK detector separate may make sense, since that one isn't specifically about reference sanity-checking, and wouldn't share code / has a different looping setup anyway.

Potentially some core linter interface changes for Skeema v1.3.x

Feel free to ping me if you want some feedback!

Thanks! At a high level, two things I'm still undecided on:

  • Potentially deprecating the errors and warnings options, in favor of giving each detector its own option. e.g. lint-no-pk, lint-bad-charset, lint-bad-engine, etc each with possible enum values of "ignore", "warning", or "error".

    Reasoning: as the number of detectors increases, the values of errors and warnings will get painfully long and cumbersome. The default values will keep needing to change as new detectors are added, and if a user is already overriding them, they may unintentionally be disabling new detectors.

    I'd still keep the old options around as a deprecated shortcut, e.g. setting warnings=foo,bar would convert to lint-foo=warning and lint-bar=warning. But the default value logic would change to avoid new detectors being unintentionally disabled.

  • Potentially changing the function signature for detectors. In Skeema 1.3.x, I plan to have skeema diff and skeema push perform linting (optional but enabled by default). However, only changed objects in the diff will be linted. A simple implementation would be to lint everything and then filter down the annotations to objects with changes in the diff; however, that's inefficient, especially for users with large environments.

    A better solution may be to pass lower-level objects (tables, routines, etc) to the detectors directly. But the question is how best to do that...

    a) Allow detectors to register what type of objects they want, and then just pass them interface{} values which they type-assert
    b) Have separate function signatures for e.g. TableDetector vs RoutineDetector
    c) Pass in a new DetectorInput struct, which has separate fields for each type of object, but only one will be non-nil in any given detector function call

    Either way, the surrounding schema will still be passed in as well, to facilitate things like FK reference checks.

@johlo
Copy link
Contributor Author

johlo commented Jul 5, 2019

I'm hesitant to have too many separate FK-related checks, since many MySQL deployments don't use FKs at all (including all of the largest companies), and each new detector entails configuration and documentation overhead. But perhaps some of these could be combined into a single detector, which sanity-checks FK references in general.

I guess I'm coming from different type of companies use-case, the databases I work with (and have worked with previously at other companies) make heavy use FK's and rely on it to keep the data consistent (as opposed to having application-level checks).
So from my point of view the more checks the better!

But I see your point about added cost in terms of maintenance etc., building larger detectors might alleviate this, yes.

Along those lines, how would you feel about combining fk-missing-parent-table and non-unique-fk-ref into a single fk-ref detector? And then in the future if it makes sense to support some of the additional checks, these could be added to that same detector.

Actually, I think the fk-missing-parent-table and non-unique-fk-ref should be separate passes.
Missing a parent table should be a hard error during linting but the non-unique check should really just be a warning (it is a valid mysql construct, just not recommended).
Not sure, but maybe the checks for "real" error cases could be grouped into a single detector and then have invdividual checkers for "warnings". I think most detectors I've considered would be error detectors (although the duplicate-fk wouldn't be a hard error I guess).

From a user-perspective maybe there could be 2 options to enable/disable these types of FK lint errors. Not sure I explained that very well, would have to think it through a bit more :)

Potentially deprecating the errors and warnings options, in favor of giving each detector its own option. e.g. lint-no-pk, lint-bad-charset, lint-bad-engine, etc each with possible enum values of "ignore", "warning", or "error".

Yes, I think this is the right way to go, it's how I expected it to work when I first looked at skeema :)
Also, this might be possible to extend to allow setting ignore/warning/error on a per-statement basis which will probably be useful at some point.

Potentially changing the function signature for detectors. In Skeema 1.3.x, I plan to have skeema diff and skeema push perform linting (optional but enabled by default). However, only changed objects in the diff will be linted.

Either way, the surrounding schema will still be passed in as well, to facilitate things like FK reference checks.

That would be a nice feature, but as you note about the FK I guess it could be tricky to do because you need to figure out all dependencies. e.g. for the non-unique-fk-ref detector if the referenced index changes to not being uniqu anymore you'd have to know that the non-unique-fk-ref needs to run on all FK's referencing that index?

@evanelias
Copy link
Contributor

All of the larger MySQL shops avoid (and often outright ban) foreign keys for three major reasons:

  • FKs don't play nice with online schema change tools. Percona's pt-osc allows them, albeit with extra complexity and risk. The other popular OSC tools -- gh-ost, fb-osc, LHM -- don't support foreign keys at all. See Thoughts on Foreign Keys? github/gh-ost#331 for more on this topic from the author of gh-ost and oak-online-alter-table.

  • Conceptually, FKs simply don't work across a sharded environment. And although they'd still work within a single shard, application-level checks become necessary anyway for cross-shard purposes. So sharded companies tend to just converge on application-level checks exclusively.

  • FKs introduce nontrivial write latency due to the extra locking. In a high-write-volume OLTP environment, the performance impact can be quite substantial if most tables have numerous FKs.

An additional consideration, although not as significant as the above, is that foreign key error messages and error codes are often overly vague in MySQL.

Skeema was designed with sharding and OSC tools in mind as first-class concerns. Unfortunately that inherently means FKs have not been a primary concern. Full support for all aspects of FKs -- meaning mapping out the dependency chains to run ordered DDL without foreign_key_checks=0; ability to lint cross-database FKs; ability to lint both sides of affected FK dependencies in a diff; etc -- are simply not going to be possible with the tool's architecture.

With that in mind, I'd like to avoid having too many config options regarding foreign keys. That could give users the mistaken impression that all aspects of FKs are fully supported, and/or that Skeema is very focused on FKs. As a professional database consultant there have been a number of cases where I've helped clients remove their FKs to solve performance problems, and having a ton of FK-specific support in my tooling just won't make sense from a business perspective.

That's not to say I'm opposed to adding anything FK related; just that I'd like to avoid having 3+ new lint detectors / config options that are all FK related at this time, especially while the total number of lint detectors overall is so small.

@johlo
Copy link
Contributor Author

johlo commented Jul 6, 2019

Thanks for the detailed explanation! I wasn't aware of the lack of support with the OSC tools, that's good to know.

An additional consideration, although not as significant as the above, is that foreign key error messages and error codes are often overly vague in MySQL.
This is one of the reasons I'd like to have skeema linters that can catch many of the problems :)

As the FK linting isn't useful for your intended users I think it makes most sense to close this PR without merging, particularly since it could have negative value for your business (it will add maintenance cost).

I will probably try to keep my fork in sync with upstream and add more FK linting over time as I need it.
And in the future we could revisit the FK linters and see if it would be interesting to merge back to upstream, does that make sense?

@evanelias
Copy link
Contributor

Makes sense, but I'd be happy to keep the PR open a while. Up to you, but here's my thinking:

Other users may have similar FK-related linting needs, and this PR could serve as a good place to centralize discussion. If there's sufficient interest among users, I would definitely reprioritize FK support accordingly.

In the near term -- over the next few weeks, once the v1.3 linter refactor is done -- I'm still totally open to merging one or two FK-related detectors from here. It's just a question of which ones or whether things are combined. (That v1.3 refactor will necessitate some minor code changes here, but it will help make the integration testing easier, so should be a net win.)

Longer term, once more detectors are in place in general, having additional FK-related detectors will be more feasible. I just want to avoid a situation in the near-term where like half of the total lint detectors are FK-related, while the rest of Skeema is less FK-focused.

I also plan to look into the feasibility of using dynamically-loaded Golang plugins to support arbitrary linter extensions. Ideally that could allow you to support a larger library of FK checks smoothly, without even needing to use a fork. From what I understand, there are currently some compile-time headaches with Go plugins, but I'm eager to see if Go 1.13 improves the situation.

@evanelias
Copy link
Contributor

Now that v1.3 has been released I'm going to close this PR, since the interfaces related to linter checks have all changed substantially. The good news is that the v1.3 interfaces are designed to make it much easier to create and maintain new linter checks, regardless of whether submitting them upstream or keeping them as a private fork / patchset.

In the coming weeks I'll write a thorough doc covering how to code new linter checks. The brief version of what's changed:

  • Each linter check is now more self-contained, following a convention of one file per check. You can see the 10 linter checks in v1.3 in linter/check_*.go.
    • This means if you want to add new custom checks, without submitting them upstream, you don't need to fully fork Skeema. You could keep your custom checks in a separate repo, and then just write a small custom build script to build your modified version of skeema (i.e. just copy the new checks into the linter dir before running go build on skeema).
    • I've held off on supporting Golang plugins for now, since they complicate cross-compiling, but hope to pick this back up in the future.
  • The check functions are now called once per object (table/proc/func), rather than being called on the entire schema. This enables skeema diff and skeema push to only lint modified objects in the diff, while skeema lint still lints everything in a schema.
  • There are a few different function signatures that can be used for linter checks:
    • Use TableChecker for checks that can potentially return multiple results for a single table, e.g. the duplicate index checker uses this since a table may potentially have multiple duplicate indexes
    • Use TableBinaryChecker for checks that can only be triggered at most once per table, e.g. the primary key checker uses this since a table either has a PK or it doesn't
    • Use RoutineChecker for checks on procs/funcs
    • Each of these function types has a conversion function in linter/linter.go, allowing them to use a common lower-level interface. Basically they're just syntactic sugar to make checks easier to write.
  • Integration testing is simplified via some helper logic in the test suite: all existing linter rules are run (regardless of their DefaultSeverity) against the files in linter/testdata/validcfg/*.sql, and the test suite verifies that the resulting annotations correspond to the lines with /* annotations */ comments on them.

@evanelias evanelias closed this Sep 11, 2019
@evanelias
Copy link
Contributor

So I know this PR is from 5 years ago, but I think it's finally the right time to include a "parent side of FK needs a unique index" linter check 😄 Why now? Because lately MySQL is moving towards making that a requirement for FKs in InnoDB -- see the foreign key section of https://www.skeema.io/blog/2024/05/14/mysql84-surprises/

However, a lot has changed in Skeema's codebase in 5 years. And GitHub won't even let me reopen this PR as-is anyway, since we switched from master to main for the default branch a long while back, so the merge target of master no longer exists.

@johlo I'm not sure if you're still using Skeema or MySQL (or even GitHub), and definitely have no expectations that you'd want to rework any of this old code on your end. However, I want to make sure you get credit for the idea and initial implementation. So my current thinking (unless you object) is that I'll use some Git hackery to include your commits here in a branch, and then add some commits on top to rework/modernize and avoid a million merge conflicts.

A few notes, mostly just for myself:

  • By default in MySQL 8.4.0, InnoDB now wants the parent table to have a unique index on the exact columns of the FK. It can't have any additional columns.
  • Also for this specific purpose it doesn't automatically extend the index with the PK columns. Arguably, that's a MySQL bug/oversight.
  • I plan to rename this option to lint-fk-parent.
  • It will also check if the parent table table exists, and flag cases where it doesn't, in the same linter check. (Although I understand the original objection to combining these, re: wanting different linter severity levels for the two situations, I believe the changes in 8.4.0 lessen the need for that.)
  • Due to Skeema's internal design, this still won't be able to look at FKs which span across schema boundaries. (That might be possible in a future v2.x of Skeema some day, but no specific plans yet.)
  • The duplicate FK checker from this PR won't be included yet. It's a good idea but I suspect it's a fairly rare problem.

evanelias added a commit that referenced this pull request May 20, 2024
*** This is a work-in-progress commit, which will be rebased/amended ***
evanelias added a commit that referenced this pull request May 20, 2024
*** This is a work-in-progress commit, which will be rebased/amended ***
evanelias added a commit that referenced this pull request May 21, 2024
*** This is a work-in-progress commit, which will be rebased/amended ***
evanelias added a commit that referenced this pull request May 23, 2024
This commit adds a new linter check/option, lint-fk-parent, which detects two
types of issues with the parent-side table of a foreign key constraint:

* Parent table does not exist
* Parent table lacks a unique constraint covering the exact referenced columns

The latter problem is especially relevant in MySQL 8.4. For more information
see https://www.skeema.io/blog/2024/05/14/mysql84-surprises/#foreign-key-restrictions

This linter check does not ever flag foreign keys which extend across schema
(database) boundaries, since Skeema's workspace model only operates on a
single schema at a time.

The logic for this linter check was originally inspired by pull request #79
from 2019. This new commit is being merged to main along with preceding commit
db33690 from that PR, in order to assign credit to the author; however this
commit must fully rewrite/remove that older commit due to underlying code
structure changes in the past five years. Also, it was not possible to merge
that old PR directly today due to change of Skeema's default branch name.

Separate from the new linter check, this commit also makes some minor tweaks
to the linter package:

* Old test logic relating to the no-longer-relevant MariaDB server variable
  innodb_read_only_compressed has been removed; this variable isn't enabled
  by default in any MariaDB release since Feb 2022

* Linter logic will now panic upon a duplicate attempt to register the same
  check name, as this indicates programmer error / copypasta

* lint-has-fk will now properly set the annotation line number even if
  multiple spaces occur between the keywords FOREIGN and KEY.
@johlo
Copy link
Contributor Author

johlo commented Jun 14, 2024

Hi Evan,
Great to see that this feature was eventually added! I was running a private fork for a while to support our use-cases. Currently I'm only using postgres, but if I get back to mysql I'll definitely use Skeema again, great tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants