feat: enhanced `lab check` to make local linting easier #621

nathan-weinberg · 2024-03-14T18:46:17Z

Which issue is resolved by this Pull Request:
Resolves #515

Description of your changes:
edited from original: for context originally created lab lint command that linted CLI Python code via tox

Adds yaml linting to `lab check'
Adds tests for `lab check'
create-untracked() func can now write byte strings

Also fixes incorrect version pin for Labeler action

nathan-weinberg · 2024-03-14T20:10:05Z

cc @deanwampler

deanwampler · 2024-03-14T20:21:48Z

Hi, is this linting the CLI's Python code? #515 was intended to implement linting of the qna.yaml YAML files, specifically any user-specified files. (I suppose it could do all of them if you don't specify a file or directory.) Make sense?

nathan-weinberg · 2024-03-14T20:27:48Z

Hi, is this linting the CLI's Python code? #515 was intended to implement linting of the qna.yaml YAML files, specifically any user-specified files. (I suppose it could do all of them if you don't specify a file or directory.) Make sense?

Yup this is for the Python code but I see your point - let me see about extending this to cover linting those YAML files as well 👍

nathan-weinberg · 2024-03-14T20:38:01Z

@ckadner I wonder if we might be able to turn the raw Shell code in that lint action in taxonomy to a shell script within that repo? The CI there could replace that snippet with calling the script and the CLI could use that for YAML linting, which would ensure any changes made to the YAML scheme in that repo would be seemlessly enforced by the CLI

danmcp · 2024-03-15T01:52:58Z

Some potential concerns about linting the python code through lab lint:

This would blur the line between user and developer roles and would clutter up the non developer experience.
This would require tox be installed to use. Currently it's in the dev requirements.
There is an effort to move away from shelling out from lab

nathan-weinberg · 2024-03-15T02:15:10Z

Some potential concerns about linting the python code through lab lint:

* This would blur the line between user and developer roles and would clutter up the non developer experience.

* This would require tox be installed to use.  Currently it's in the dev requirements.

* There is an effort to move away from shelling out from lab

I tried to address some of these concerns in my followup commit:

Made the default behavior just to do basic YAML linting (for now) - I've kept in the Python linting because I think it's useful for CLI development but it's now behind a --dev flag so average users will not hit it accidentally
As for tox/shelling out, deciding to use tox is a decision that predates me - if there is a way to do the linting by bypassing that I'm open to suggestions, but otherwise I'm not sure how we get around shelling it out (edit: One possible way around this could be calling pytest directly from the CLI code and bypassing tox entirely)

Other changes I made here:

Got green/red coloring working for both Python/YAML linting
Added yaml_rules.yaml based off current rules in taxonomy GitHub Action (can move this to that repo if it's preferred)
Added yamllint to requirements.txt

deanwampler · 2024-03-15T14:29:13Z

I recommend the following changes:

Remove the python linting. I agree with the comments above that it confuses internal developer and end-user concerns. Add a make file for useful developer automation instead.
Running yamllint helps, but it doesn't cover all the other yq commands that are also executed. The goal of Add a lab lint command #515 is to reproduce all linting steps. Hence, I recommend you refactor taxonomy/.github/workflows/lint.yml to move all the shell commands to a standalone shell script (e.g., like the existing testrun.sh) and call it from the PR workflow and from lab lint. Of course, the problem with this approach is it only works in *nix compatible shells. That's fine with me 😁.

nathan-weinberg · 2024-03-15T15:37:49Z

I recommend the following changes:

1. Remove the python linting. I agree with the comments above that it confuses internal developer and end-user concerns. Add a `make` file for useful developer automation instead.

2. Running `yamllint` helps, but it doesn't cover all the other `yq` commands that are also executed. The goal of [Add a `lab lint` command #515](https://github.com/instruct-lab/cli/issues/515) is to reproduce _all_ linting steps. Hence, I recommend you refactor [taxonomy/.github/workflows/lint.yml](https://github.com/instruct-lab/taxonomy/blob/main/.github/workflows/lint.yml) to move all the shell commands to a standalone shell script (e.g., like the existing `testrun.sh`) and call it from the PR workflow and from `lab lint`. Of course, the problem with this approach is it only works in *nix compatible shells. That's fine with me 😁.

I'm fine with this approach, but I'm going to see if we can get some weigh-in from the project maintainers to ensure this is the direction we're all fine going with since it'll require removal of some functionality from this PR (we should probably then open a separate issue for creating a Makefile which I'm happy to address in a seperate PR) as well as a PR to taxonomy to make the changes to the lint action in that repo

markstur · 2024-03-15T15:48:43Z

First impression is that it's very confusing because "lab lint" would be expected to be only for "users" not at all like tox lint/fmt for devs.

Looking deeper, I see you could hide some of that with --dev flag, but let's go back to "it's confusing to users" and just not do that. Devs can use tox.

Nice work (thank you), but feels a little misplaced for the user experience when mixing in dev stuff.

markstur · 2024-03-15T15:55:28Z

Also curious why we need a Makefile? I don't mind them, but I've seen them removed from projects and we have tox list for devs and lab for users. Need to get some consensus around the right tools for the project (folks keep adding good ideas :) but we are getting a lot of directions (pipx, ruff, make, ...)

There's some benefit in keeping it minimum defacto standards (and that could include make but ? )

note: I'm all for consensus building here, if I didn't convey that ^

deanwampler · 2024-03-15T16:04:00Z

That was only a suggestion. It would let a CLI developer type make lint to do the Python linting.

hickeyma · 2024-03-15T16:34:26Z

@nathan-weinberg Should this be part of the lab check command that checks the knowledge/skills changes? I am confused what this has to do with the CLI python code and adding GH workflows?

nathan-weinberg · 2024-03-15T17:17:20Z

@nathan-weinberg Should this be part of the lab check command that checks the knowledge/skills changes? I am confused what this has to do with the CLI python code and adding GH workflows?

CLI linting Python code came from me, but seems the general consensus here is to move away from that.

Not looking to add any workflows here, but I think what @deanwampler was saying in his original issue (feel free to step in Dean if I'm mistaken) was that he wanted a way to do the equivalent linting being done by the Taxonomy repo Lint action locally from the CLI.

nathan-weinberg · 2024-03-15T18:53:49Z

Updated the PR to remove the Python linting and move the basic YAML checking into the existing lab check command rather than introducing the new lab lint command

markstur

I'd like to see this more backwards compatible. Not sure this would break anyone/anywhere but I've been burned by that lately. It's not a protected API.

Along with that there's an unnecessary change in the KeyError exception that makes that error useful (I think) and isn't necessary to remove that.

cli/generator/generate_data.py

cli/lab.py

tests/taxonomy.py

nathan-weinberg · 2024-03-20T23:55:14Z

Squashed commits to ease rebasing

aesteve-rh · 2024-03-21T14:09:53Z

I am no maintainer, but code LGTM, except for the linting issues that point to duplicated imports. Please fix those first.

Also, I'd suggest splitting the update of the labeler action to avoid mixing stuff. If the label is incorrect everyone may benefit of having it updated asap, and otherwise it gets mixed up and delayed by the other changes in this PR.

nathan-weinberg · 2024-03-21T14:16:00Z

I am no maintainer, but code LGTM, except for the linting issues that point to duplicated imports. Please fix those first.

Also, I'd suggest splitting the update of the labeler action to avoid mixing stuff. If the label is incorrect everyone may benefit of having it updated asap, and otherwise it gets mixed up and delayed by the other changes in this PR.

Those are fixed now - this should be going in today so updating the labeler here should be fine. Thanks for reviewing!

nathan-weinberg · 2024-03-21T14:16:20Z

cc @xukai92

anik120

Left some comments, thanks for adding the tests @nathan-weinberg.

I think it'll help to keep only related changes in this PR (ie changes related to lab check). It'll help review the PR if the extra comments are removed, thank you!

cli/generator/generate_data.py

cli/lab.py

tests/test_lab_check.py

.github/workflows/label.yml

anik120 · 2024-03-21T17:36:53Z

@nathan-weinberg could you un-resolve the comments you didn't address please. It's helpful to have the either have the conversation and let the commenter resolve, or ignore it too but let other reviewers say "this comment doesn't matter much so resolving" instead of the author resolving it themselves.

Copying the highest priority comment here that's been resolved without addressing:

anik120
I'm a little confused by this. I see the file added to the top level project directory. How does it work when I don't have this project cloned locally?
I was expecting the design for the yaml-rules files to be something like the design for prompt.txt file. When you lab generate, you'll see the message No prompt file found, using default prompts. The default prompt template is a hardcoded contanst. ie if you add a prompt file to your workspace (that you initialized with lab init), it overrides the default prompt template.
So I'm not exactly sure how the yaml-rules.yaml you added will be used if I do a fresh installation of the cli in my machine?

nathan-weinberg
I think you're misunderstanding what's happening here - where you've commented has nothing to do with that, this is just fixing the version set in #591

Yea my bad, I placed that comment in the wrong place (I wanna say it's because there were so many other unrelated changes like the code comments that I had to go through, that the review process got derailed, but for the sake of moving things along, let's go with "it's my bad")

But I hope my comment is clear. Let me phrase it in another way:

Quality testing this PR Failed.

$ git fetch downstream pull/621/head:nathan-test 
From https://github.com/instruct-lab/cli
 * [new ref]         refs/pull/621/head -> nathan-test
 $ git checkout nathan-test 
Switched to branch 'nathan-test'

$ pip install .
Processing /Users/anbhatta/DeepLearning/labrador/cli
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: click<9.0.0,>=8.1.7 in /Users/anbhatta/DeepLearning/labrador/venv/lib/python3.11/site-packages (from cli==0.0.1) (8.1.7)
Requirement already satisfied: click-didyoumean<0.4.0,>=0.3.0 in /Users/anbhatta/DeepLearning/labrador/venv/lib/python3.11/site-packages (from cli==0.0.1) (0.3.0)
Requirement already satisfied: llama-cpp-python[server]<0.3.0,>=0.2.55 in /Users/anbhatta/DeepLearning/labrador/venv/lib/python3.11/site-packages (from cli==0.0.1) (0.2.55)
Requirement already satisfied: rouge-score<0.2.0,>=0.1.2 in /Users/anbhatta/DeepLearning/labrador/venv/lib/python3.11/site-packages (from cli==0.0.1) (0.1.2)
.
.
.

$ cd ../test-workspace
$ lab list 
compositional_skills/writing/freeform/jokes/puns-copy/qna.yaml

$ lab check 
INFO 2024-03-21 13:36:19,745 generate_data.py:612 Found new taxonomy files:
INFO 2024-03-21 13:36:19,745 generate_data.py:614 * compositional_skills/writing/freeform/jokes/puns-copy/qna.yaml
ERROR 2024-03-21 13:36:19,745 generate_data.py:585 Exception [Errno 2] No such file or directory: 'yaml_rules.yaml' raised in taxonomy/compositional_skills/writing/freeform/jokes/puns-copy/qna.yaml
1 taxonomy files with errors! Exiting.

anik120 · 2024-03-21T17:39:39Z

Holding this PR for now @xukai92, might help to have another look, maybe I'm missing something in the setup that Nathans is suggesting here.

nathan-weinberg · 2024-03-21T18:07:56Z

@anik120 All comments were addressed, not sure what you're referring to but seems you had some issues with your review - let's sync over this in-person

yaml_rules.yaml

xukai92 · 2024-03-22T03:54:06Z

not sure why functional tests are failing. can you try rebasing with main?

xukai92 · 2024-03-22T04:37:49Z

I did a rebase with main for you to see if this fixes tests.

anik120

Looks like the function test is failing coz seed_instructions_data is None here

I'm guessing it's coming from read_taxonomy here

cli/generator/generate_data.py

nathan-weinberg · 2024-03-22T16:24:56Z

Looks like the function test is failing coz seed_instructions_data is None here

I'm guessing it's coming from read_taxonomy here

I have found and fixed this issue (needed an additional with open(file_path, "r", encoding="utf-8") as file as seen on L553 otherwise the YAML load on L554 returns None despite the file being present and valid) - the current failure in GitHub Actions I do not know, seems completely unrelated to my change, but the formerly failing test_generate can be seen passing the functional tests locally:

+2024-03-22 12:21:29 ./scripts/functional-tests.sh:222: test_generate
+2024-03-22 12:21:29 ./scripts/functional-tests.sh:189: test_generate(): cat -
+2024-03-22 12:21:29 ./scripts/functional-tests.sh:199: test_generate(): sed -i -e 's/num_instructions:.*/num_instructions: 1/g' config.yaml
+2024-03-22 12:21:29 ./scripts/functional-tests.sh:202: test_generate(): timeout 5m lab generate --taxonomy-path simple_math.yaml
INFO 2024-03-22 12:21:32,938 lab.py:423 Generating model 'merlinite-7b-Q4_K_M' using 10 cpus, taxonomy: 'simple_math.yaml' and seed 'seed_tasks.json' against http://localhost:28711/v1 server
WARNING 2024-03-22 12:21:32,952 generate_data.py:544 Cannot find yaml_rules.yaml. Using default rules.
Cannot find prompt.txt. Using default prompt.
Synthesizing new instructions. If you aren't satisfied with the generated instructions, interrupt training (Ctrl-C) and try adjusting your YAML files. Adding more examples may help.
  0%|                                                                                                                                                                         | 0/1 [00:00<?, ?it/s]INFO 2024-03-22 12:21:32,958 generate_data.py:378 Selected taxonomy path 
Q> calculate the total cost when each book is $10.
I> <noinput>
3 items
A> The total cost is $30

Q> poll
find out the average age in a group consisting of three people aged 21,24,28 respectively.
I> 
A> The average age is 24.33 years old.

2it [01:06, 33.19s/it]                                                                                                                                                                              INFO 2024-03-22 12:22:39,335 generate_data.py:463 Generation took 66.40s
2it [01:06, 33.19s/it]
+2024-03-22 12:22:39 ./scripts/functional-tests.sh:205: test_generate(): ls -l generated/generated_merlinite-7b-Q4_K_M_2024-03-22T11_46_07.json generated/generated_merlinite-7b-Q4_K_M_2024-03-22T12_07_37.json generated/generated_merlinite-7b-Q4_K_M_2024-03-22T12_16_53.json generated/generated_merlinite-7b-Q4_K_M_2024-03-22T12_21_32.json generated/test_merlinite-7b-Q4_K_M_2024-03-22T11_46_07.jsonl generated/test_merlinite-7b-Q4_K_M_2024-03-22T12_07_37.jsonl generated/test_merlinite-7b-Q4_K_M_2024-03-22T12_16_53.jsonl generated/test_merlinite-7b-Q4_K_M_2024-03-22T12_21_32.jsonl generated/train_merlinite-7b-Q4_K_M_2024-03-22T11_46_07.jsonl generated/train_merlinite-7b-Q4_K_M_2024-03-22T12_07_37.jsonl generated/train_merlinite-7b-Q4_K_M_2024-03-22T12_16_53.jsonl generated/train_merlinite-7b-Q4_K_M_2024-03-22T12_21_32.jsonl
-rw-r--r--. 1 nathan nathan 229 Mar 22 11:47 generated/generated_merlinite-7b-Q4_K_M_2024-03-22T11_46_07.json
-rw-r--r--. 1 nathan nathan 806 Mar 22 12:09 generated/generated_merlinite-7b-Q4_K_M_2024-03-22T12_07_37.json
-rw-r--r--. 1 nathan nathan 596 Mar 22 12:18 generated/generated_merlinite-7b-Q4_K_M_2024-03-22T12_16_53.json
-rw-r--r--. 1 nathan nathan 581 Mar 22 12:22 generated/generated_merlinite-7b-Q4_K_M_2024-03-22T12_21_32.json
-rw-r--r--. 1 nathan nathan 552 Mar 22 11:47 generated/test_merlinite-7b-Q4_K_M_2024-03-22T11_46_07.jsonl
-rw-r--r--. 1 nathan nathan 552 Mar 22 12:09 generated/test_merlinite-7b-Q4_K_M_2024-03-22T12_07_37.jsonl
-rw-r--r--. 1 nathan nathan 552 Mar 22 12:18 generated/test_merlinite-7b-Q4_K_M_2024-03-22T12_16_53.jsonl
-rw-r--r--. 1 nathan nathan 552 Mar 22 12:22 generated/test_merlinite-7b-Q4_K_M_2024-03-22T12_21_32.jsonl
-rw-r--r--. 1 nathan nathan 304 Mar 22 11:47 generated/train_merlinite-7b-Q4_K_M_2024-03-22T11_46_07.jsonl
-rw-r--r--. 1 nathan nathan 881 Mar 22 12:09 generated/train_merlinite-7b-Q4_K_M_2024-03-22T12_07_37.jsonl
-rw-r--r--. 1 nathan nathan 750 Mar 22 12:18 generated/train_merlinite-7b-Q4_K_M_2024-03-22T12_16_53.jsonl
-rw-r--r--. 1 nathan nathan 735 Mar 22 12:22 generated/train_merlinite-7b-Q4_K_M_2024-03-22T12_21_32.jsonl
++2024-03-22 12:22:39 ./scripts/functional-tests.sh:206: test_generate(): jq '. | length'
+++2024-03-22 12:22:39 ./scripts/functional-tests.sh:206: test_generate(): tail -n 1
+++2024-03-22 12:22:39 ./scripts/functional-tests.sh:206: test_generate(): ls -tr generated/generated_merlinite-7b-Q4_K_M_2024-03-22T11_46_07.json generated/generated_merlinite-7b-Q4_K_M_2024-03-22T12_07_37.json generated/generated_merlinite-7b-Q4_K_M_2024-03-22T12_16_53.json generated/generated_merlinite-7b-Q4_K_M_2024-03-22T12_21_32.json
++2024-03-22 12:22:39 ./scripts/functional-tests.sh:206: test_generate(): cat generated/generated_merlinite-7b-Q4_K_M_2024-03-22T12_21_32.json
+2024-03-22 12:22:39 ./scripts/functional-tests.sh:206: test_generate(): '[' 2 -lt 1 ']'

Adds yaml linting to `lab check' Adds tests for `lab check' 'create-untracked' func can now write byte strings Signed-off-by: Nathan Weinberg <nweinber@redhat.com>

cli/config.py

nathan-weinberg requested review from mairin, kelbrown20, jeremyeder and abhi1092 as code owners March 14, 2024 18:46

nathan-weinberg requested review from soltysh, xukai92, markstur, hickeyma, afrittoli, spzala, Tomcli and mrutkows as code owners March 15, 2024 02:08

nathan-weinberg mentioned this pull request Mar 15, 2024

ci: moved shell block in lint action to dedicated script instructlab/taxonomy#530

Merged

nathan-weinberg changed the title ~~feat: implemented lab lint to make local linting easier~~ feat: enhanced lab check to make local linting easier Mar 15, 2024

markstur suggested changes Mar 15, 2024

View reviewed changes

nathan-weinberg requested a review from markstur March 15, 2024 23:01

xukai92 added this to the March 21 Musts milestone Mar 19, 2024

nathan-weinberg requested a review from anik120 March 20, 2024 01:06

aesteve-rh reviewed Mar 20, 2024

View reviewed changes

tests/taxonomy.py Show resolved Hide resolved

nathan-weinberg requested a review from aesteve-rh March 20, 2024 12:59

anik120 suggested changes Mar 21, 2024

View reviewed changes

anik120 added the hold In-progress PR. Tag should be removed before merge. label Mar 21, 2024

anik120 removed the hold In-progress PR. Tag should be removed before merge. label Mar 21, 2024

xukai92 requested changes Mar 21, 2024

View reviewed changes

yaml_rules.yaml Outdated Show resolved Hide resolved

nathan-weinberg requested review from xukai92 and anik120 March 22, 2024 03:15

anik120 suggested changes Mar 22, 2024

View reviewed changes

cli/generator/generate_data.py Outdated Show resolved Hide resolved

feat: enhanced 'lab check' to make local linting easier

b0a7fb4

Adds yaml linting to `lab check' Adds tests for `lab check' 'create-untracked' func can now write byte strings Signed-off-by: Nathan Weinberg <nweinber@redhat.com>

nathan-weinberg requested a review from anik120 March 22, 2024 19:20

xukai92 approved these changes Mar 22, 2024

View reviewed changes

anik120 suggested changes Mar 22, 2024

View reviewed changes

cli/config.py Show resolved Hide resolved

anik120 approved these changes Mar 22, 2024

View reviewed changes

xukai92 approved these changes Mar 22, 2024

View reviewed changes

xukai92 merged commit 95e1666 into instructlab:main Mar 22, 2024

nathan-weinberg deleted the lint branch March 22, 2024 20:39

mrutkows mentioned this pull request Apr 16, 2024

Recent inclusion of yamllint causes GPL3 conflict with project Apache 2 license #870

Closed

feat: enhanced lab check to make local linting easier #621

feat: enhanced lab check to make local linting easier #621

Uh oh!

Conversation

nathan-weinberg commented Mar 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nathan-weinberg commented Mar 14, 2024

Uh oh!

deanwampler commented Mar 14, 2024

Uh oh!

nathan-weinberg commented Mar 14, 2024

Uh oh!

nathan-weinberg commented Mar 14, 2024

Uh oh!

danmcp commented Mar 15, 2024

Uh oh!

nathan-weinberg commented Mar 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deanwampler commented Mar 15, 2024

Uh oh!

nathan-weinberg commented Mar 15, 2024

Uh oh!

markstur commented Mar 15, 2024

Uh oh!

markstur commented Mar 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deanwampler commented Mar 15, 2024

Uh oh!

hickeyma commented Mar 15, 2024

Uh oh!

nathan-weinberg commented Mar 15, 2024

Uh oh!

nathan-weinberg commented Mar 15, 2024

Uh oh!

markstur left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nathan-weinberg commented Mar 20, 2024

Uh oh!

aesteve-rh commented Mar 21, 2024

Uh oh!

nathan-weinberg commented Mar 21, 2024

Uh oh!

nathan-weinberg commented Mar 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anik120 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anik120 commented Mar 21, 2024

Uh oh!

anik120 commented Mar 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nathan-weinberg commented Mar 21, 2024

Uh oh!

Uh oh!

xukai92 commented Mar 22, 2024

Uh oh!

xukai92 commented Mar 22, 2024

Uh oh!

anik120 left a comment

feat: enhanced `lab check` to make local linting easier #621

feat: enhanced `lab check` to make local linting easier #621

nathan-weinberg commented Mar 14, 2024 •

edited

Loading

nathan-weinberg commented Mar 15, 2024 •

edited

Loading

markstur commented Mar 15, 2024 •

edited

Loading

nathan-weinberg commented Mar 21, 2024 •

edited

Loading

anik120 left a comment •

edited

Loading

anik120 commented Mar 21, 2024 •

edited

Loading