Skip to content

ilab generate fails with schema errors when used with existing skills in the instructlab/taxonomy repo #989

@bbrowning

Description

@bbrowning

We validate the yaml files match our linting rules (line length, required attribution, etc) during ilab generate. Some of the merged skills and knowledge in the instructlab/taxonomy repository were merged before all of the linting was in place, and are not valid by today's standards. We need some way to tell ilab generate to ignore yaml validation errors or some versioning in place between the schema, CLI, and taxonomy files so that each change to our linting rules is a new schema version and we can properly handle older taxonomy files done with looser linting compared to newer taxonomy files with different linting rules.

Example of the error I get today pointing the CLI at some existing knowledge in the merged taxonomy repo:

$ ilab generate --taxonomy-path taxonomy/knowledge/textbook/history/ibm_history/qna.yaml
...
INFO 2024-04-24 18:12:31,333 _client.py:1026 HTTP Request: GET http://127.0.0.1:8000/v1/models "HTTP/1.1 200 OK"
Generating synthetic data using 'merlinite-7b-lab-Q4_K_M' model, taxonomy:'taxonomy/knowledge/textbook/history/ibm_history/qna.yaml' against http://127.0.0.1:8000/v1 server
ERROR 2024-04-24 18:12:31,421 utils.py:462 Problems found in file taxonomy/knowledge/textbook/history/ibm_history/qna.yaml
36:121: [warning] line too long (314 > 120 characters) (line-length)
39:121: [warning] line too long (363 > 120 characters) (line-length)
42:121: [warning] line too long (410 > 120 characters) (line-length)

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions