-
Notifications
You must be signed in to change notification settings - Fork 441
Use the taxonomy schema to validate taxonomy yaml files #776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think the functional test failure is flakiness in the tests. See #748. |
Updated to use |
@bjhargrave this needs to be updated too |
I'm adding this to the next milestone |
Done in b3669de#diff-c930cb39432fe5403f7d83393641f63b81edd36cdd0a7788eb6f3548dee1a13fR186 |
@bjhargrave doing a quick PR #791 to get us moving with |
Great. I rebased. |
@xukai92 The packaging issue which caused the functional test failure is now fixed. |
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Signed-off-by: BJ Hargrave <hargrave@us.ibm.com>
We will later replace with a git submodule of the schema repo once anonymous access to the repo is available. But for now, we vendor in the files. Signed-off-by: BJ Hargrave <hargrave@us.ibm.com>
We use jsonschema to validate that input yaml files conform to the schema for their relative path. Any validation failures are logged and the process ends with an error. See #760 Signed-off-by: BJ Hargrave <hargrave@us.ibm.com>
Capturing a discussion with @shivchander: I was writing up a test case for the cli to test knowledge workflow, but the way that I laid out my qna.yaml as follows:
Essentially, the The way I laid things out, the Shiv tells me that that's the wrong way of thinking about it, and the verb Eureka moment: Even after learning (only a little while ago, ie fresh info being processed by brain still) how the taxonomy interacts with the model, I was thinking about the structure of my knowledge doc, the wrong way. It's likely that other users will also confuse the taxonomy/model interactions and lay out the qna.yaml files the wrong way, leading to PR submissions that'll likely not improve model quality. Proposed fix: Change |
@@ -235,8 +235,9 @@ def diff(ctx, taxonomy_path, taxonomy_base, yaml_rules, quiet): | |||
if quiet: | |||
try: | |||
read_taxonomy(logger, taxonomy_path, taxonomy_base, yaml_rules) | |||
except (Exception, yaml.YAMLError) as exc: | |||
except (SystemExit, yaml.YAMLError) as exc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't catch exceptions like https://github.com/instruct-lab/cli/blob/main/cli/generator/generate_data.py#L652 though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. However https://github.com/instruct-lab/cli/blob/212e1c5234ff71006dd708dfc6e464c591ea7c74/cli/lab.py#L252 does not either. There are currently two code paths here: one for --quiet and one for not quiet and they have different excepts. I opened #800 to have a single mainline code path with ifs for non-quiet output.
TAXONOMY_FOLDERS: List[str] = ["compositional_skills", "knowledge"] | ||
"""Taxonomy folders which are also the schema names""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be part of the schema instead?
So a file in schema/v1/
, perhaps schema/v1/constants.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possible but I would do in a follow-on PR. It wont be useful in taxonomy repo however since that repo refers to the folder names in .github workflows (which need to work before the schema repo could be git submoduled).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
Changes
Which issue is resolved by this Pull Request:
Part of #760
Description of your changes:
This change updates read_taxonomy_file to also validate that the yaml file conforms to the schema for the part of the taxonomy in which the yaml file resides. This validation is in addition to the existing yaml linting checks.
Several tests required updating so that their test yaml files were schema valid.
For now we include the schema files in this repo. Once anonymous access to the schema repo is available, we will change to access the schema files by using git submodule on the schema repo.