Skip to content

Conversation

derekhiggins
Copy link
Contributor

Log dropped instructions during generation so
we can eventually improve parsing and prompting.

Fixes #456

@derekhiggins
Copy link
Contributor Author

This only logs the discarded instructions but I now think we should log all instructions so we have a full set to develop against.

@markstur
Copy link
Member

markstur commented Mar 9, 2024

I like it so far except for the "data" dir. Why not just put it in output dir so it's already configurable and exists and this file would be w/ the other output. Just name it like the generated/test/train files with a different prefix. It should be connected to those easily in a ls.

Also not sure about your comment logging everything. The good ones are already parsed and kept in generated. I'd focus on the skipped ones like you did.

@derekhiggins
Copy link
Contributor Author

I like it so far except for the "data" dir. Why not just put it in output dir so it's already configurable and exists and this file would be w/ the other output.

Yup, that would make infinity more sense, I thought that the generated files were still in the taxonomy directory

Also not sure about your comment logging everything. The good ones are already parsed and kept in generated. I'd focus on the skipped ones like you did.

What I was thinking here is that if we were to try to find a regex that doesn't discard as many instructions we'd want the raw data to test the regex against. The one in generated were parsed and don't include the data as it came from the model. Anyways I'm happy to go with this and we can add more later if it looks like we need it.

@derekhiggins derekhiggins force-pushed the log-discarded branch 3 times, most recently from 4b58ab2 to 206cc22 Compare March 19, 2024 17:37
Log dropped instructions during generation so
we can eventually improve parsing and prompting.

Fixes #456

Signed-off-by: Derek Higgins <derekh@redhat.com>
Copy link
Member

@hickeyma hickeyma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @derekhiggins

@hickeyma hickeyma merged commit 4ab771c into instructlab:main Mar 26, 2024
@hickeyma hickeyma deleted the log-discarded branch March 26, 2024 11:51
@anik120 anik120 added this to the Milestone 03/28 milestone Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generator silently dropping unparsable results
5 participants