-
Notifications
You must be signed in to change notification settings - Fork 441
Description
While running "lab generate" a prompt is sent to the server, for the model to complete, the returned data is expected to take the format (X is the index of the result)
X. Instruction: <instruction>
X. Input:
<noinput>
X. Output:
<instruction>
A good example of a returned result would be
'\n16. Instruction: Tell me a pun about the weather.\n16. Input:\n\n16. Output:\nWhy did the weatherman quit his job?\n\nBecause the forecast was always the same!\n\n'
But examples of unparsable output would be
missing the "Output:" string
"\n11. Instruction: Make a pun about being patient.\n11. Input:\n<noinput>\n11. I'm so patient, I could wait for a snail to crawl!\n\n"
Input prefixed with a 1 instead of a 3
'3. Instruction:What is the difference between a well-dressed wall and a poorly dressed wall?\n\n1. Input:\n<noinput>\n3. Output:\nA well-dressed wall has more walls-tic!\n\n'
3 "#"'s in the middle of the response (before the input)
"###\n10. Instruction:\nTell me a pun about the ocean.\n\n###\n10. Input:\n<noinput>\n10. Output:\nWhy don't fish play basketball?\n\nBecause they're afraid of the net!"
Instruction spelt incorrectly
'\n8. Instraction: Draft a tweet for #FridayFeeling, capturing the excitement of the weekend ahead.\n8. Input:\n<noinput>\n8. Output:\n"Feeling the weekend vibes, y\'all! 💃🕺 This #FridayFeeling, we\'re ready to unwind, recharge, and embrace the magic of two glorious days off! 💚🍻 #WeekendWarriors #FridayFeeling #Relaxation"\n'
Sometimes a large number of the results don't match the expected format and as a result are discarded, depending on how many of them behaved incorrectly this could substantially increase generate time,
we currently generating responses in batches of 20, Sometimes they are all ok and sometimes all discarded and anywhere inbetween. If a lot get discarded it would dramatically increase generation time.
Its to be expected that the format of the results may not be fully in our control we can only manipulate the prompt to try and get a better success rate. But we should log the discarded text so that we can
- get an idea of how often its happening
- adjust the prompt and see if results improve
- update the regex matching the results to get a better hit rate