-
Notifications
You must be signed in to change notification settings - Fork 440
Closed
Description
We validate the yaml files match our linting rules (line length, required attribution, etc) during ilab generate
. Some of the merged skills and knowledge in the instructlab/taxonomy repository were merged before all of the linting was in place, and are not valid by today's standards. We need some way to tell ilab generate
to ignore yaml validation errors or some versioning in place between the schema, CLI, and taxonomy files so that each change to our linting rules is a new schema version and we can properly handle older taxonomy files done with looser linting compared to newer taxonomy files with different linting rules.
Example of the error I get today pointing the CLI at some existing knowledge in the merged taxonomy repo:
$ ilab generate --taxonomy-path taxonomy/knowledge/textbook/history/ibm_history/qna.yaml
...
INFO 2024-04-24 18:12:31,333 _client.py:1026 HTTP Request: GET http://127.0.0.1:8000/v1/models "HTTP/1.1 200 OK"
Generating synthetic data using 'merlinite-7b-lab-Q4_K_M' model, taxonomy:'taxonomy/knowledge/textbook/history/ibm_history/qna.yaml' against http://127.0.0.1:8000/v1 server
ERROR 2024-04-24 18:12:31,421 utils.py:462 Problems found in file taxonomy/knowledge/textbook/history/ibm_history/qna.yaml
36:121: [warning] line too long (314 > 120 characters) (line-length)
39:121: [warning] line too long (363 > 120 characters) (line-length)
42:121: [warning] line too long (410 > 120 characters) (line-length)
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working