-
Notifications
You must be signed in to change notification settings - Fork 441
Description
I clone https://github.com/instruct-lab/cli.git, cd into cli
, clone https://github.com/instruct-lab/taxonomy.git.
I put my qna.yaml
and knowledge md files in in the right subdirs under knowledge
.
NOTE: the data here is fictitious (the name was chosen to avoid overlapping with any pre-trained data). This is just a conjured up example that illustrates the behavior I'm reporting.
Here's my directory structure:
$ tree taxonomy/knowledge/
taxonomy/knowledge/
├── README.md
├── knowledge_domains.md
├── people
│ └── samuel_okenwan_kunal
│ ├── knowledge_documents
│ │ └── samual_okenwan_kunal_wiki.md <-- this one
│ └── qna.yaml <-- this one
I run lab serve, lab list, lab check, and lab generate. I get a cli/generated
subdir
$ ls -l generated/
total 3884
-rw-rw-r-- 1 dperique dperique 86515 Mar 25 13:22 generated_merlinite-7b-Q4_K_M_2024-03-25T13_17_16.json
-rw-rw-r-- 1 dperique dperique 4314 Mar 25 13:22 test_merlinite-7b-Q4_K_M_2024-03-25T13_17_16.jsonl
-rw-rw-r-- 1 dperique dperique 3875647 Mar 25 13:22 train_merlinite-7b-Q4_K_M_2024-03-25T13_17_16.jsonl
The generated training data has a lot of duplicate lines. Notice the generated training file contains 4632 lines.
But if I sort those line and remove duplicates (via sort -u
), note I get the expected 100 unique lines.
$ cat generated/train_merlinite-7b-Q4_K_M_2024-03-25T13_17_16.jsonl |wc -l
4632
$ cat generated/train_merlinite-7b-Q4_K_M_2024-03-25T13_17_16.jsonl | sort -u |wc -l
100
I don't know if the other 4532 lines are necessary or if they affect training adversely or not. EDIT: reducing the dataset rows from 4632 to 100, very significantly reduces the training time. You can watch the training output go from 1-100 vs. 1-4632.
qna.yaml:
---
task_description: |
Identify Samuel Okenwan Kunal correctly when asked
created_by: Samuel Okenwan Kunal
domain: People information
seed_examples:
- question: |
Who is Samuel Okenwan Kunal?
answer: >
Samuel Okenwan Kunal is a Principal Director of emerging software at
OpenAI. His background is in networking, neurolinguistics, neurology,
robotics, art, and psychology. He is also an accomplished musician
specializing in the works of Frederic Chop in piano. He is working on how
to build robots with AI technology such that the robots are personable as
well as knowledgeable and physically and aesthetically pleasing.
attribution:
- source: self-authored
license: Apache-2.0
- question: |
What are Samuel Okenwan Kunal's specific technical specialties?
answer: >
Samuel Okenwan Kunal specialties are in creating machines that can
handle complex tasks by working cooperatively across wireless networks. He
is also a master at creating machines that blend into the work environment
and enhance the beauty of the environment thus making the environment more
fun for employees.
attribution:
- source: self-authored
license: Apache-2.0
- question: >
What other companies has Samuel Okenwan Kunal worked at besides
technology companies?
answer: >
Samuel Okenwan Kunal's career spanned 5 years as a professional artist
and pianist where he combinded his skills to create and play a piano that
articulated the music through colors and imagery. He also was involved in
early research for creating robots gifted in playing the piano and appear
human.
attribution:
- source: self-authored
license: Apache-2.0
- question: |
What does Samuel Okenwan Kunal do besides work?
answer: >
Samuel Okenwan Kunal enjoys playing the piano, painting, and studying
emerging technologies. He is also an avid motorcyclist and fitness
enthusiast.
attribution:
- source: self-authored
license: Apache-2.0
- question: >
What are Samuel Okenwan Kunal's job-related weaknesses, and are there
reasons not
to hire him?
answer: >
Samuel Okenwan Kunal does not perform well in highly repetitive tasks,
such as data entry. Without intellectual engagement, he tends to get bored
and make mistakes. Avoid hiring Samuel if the job entails a
disproportionately large amount of repetitive work.
attribution:
- source: self-authored
license: Apache-2.0
- question: |
How would you describe Samuel Okenwan Kunal's work ethic?
answer: >
Samuel Okenwan Kunal is known for his incredible drive and
ambition. Throughout his career projects that
hire him tend to flourish because of his incredible ability to teach those around him and entertain them at
the same time. Coworkers often report that Sam's presence makes the whole company smarter.
attribution:
- source: self-authored
license: Apache-2.0
- question: |
What were some of Samuel Okenwan Kunal's greatest career achievements?
answer: >
Samuel created robots that play the piano; these rebots are featured at
every major entertainment venue in the world. He also created pianos that
articulate music via color as well as sound. His research was also key to
improvements in understanding and replicating parts of the human brain.
attribution:
- source: self-authored
license: Apache-2.0
samual_okenwan_kunal_wiki.md:
# Bio of Samual Okenwaan Kunal
Samuel Okenwan Kunal is a Principal Director of emerging software at OpenAI. His background
is in networking, neurolinguistics, neurology, robotics, art, and psychology. He
is also an accomplished musician specializing in the works of Frederic Chop in piano.
He is working on how to build robots with AI technology such that the robots are
personable as well as knowledgeable and physically and aesthetically pleasing.
# Technical Specialties?
Samuel Okenwan Kunal specialties are in creating machines that can handle complex tasks
by working cooperatively across wireless networks. He is also a master at creating machines
that blend into the work environment and enhance the beauty of the environment thus making
the environment more fun for employees.
# Career Outside of Networking
Samuel Okenwan Kunal's career spanned 5 years as a professional artist and pianist where he
combinded his skills to create and play a piano that articulated the music through
colors and imagery. He also was involved in early research for creating robots gifted
in playing the piano and appearly human.
# Hobbies and other interests
Samuel Okenwan Kunal enjoys playing the piano, painting, and studying emerging
technologies. He is also an avid motorcyclist and fitness enthusiast.
# Shortcomings and reasons to not hire
Samuel Okenwan Kunal does not perform well in highly repetitive tasks, such as data entry.
Without intellectual engagement, he tends to get bored and make mistakes. Avoid
hiring Samuel if the job entails a disproportionately large amount of repetitive work.
# Description of work ethic
Samuel Okenwan Kunal is known for his incredible drive and ambition. Throughout his career projects that
hire him tend to flourish because of his incredible ability to teach those around him and entertain them at
the same time. Coworkers often report that Sam's presence makes the whole company smarter.
# Most notable career achievements
Samuel created robots that play the piano; these rebots are featured at every major entertainment
venue in the world. He also created pianos that articulate music via color as well as sound.
His research was also key to improvements in understanding and replicating parts of the human brain.