Skip to content
This repository was archived by the owner on Sep 9, 2025. It is now read-only.

Conversation

mcorbin-ibm
Copy link
Contributor

Reorganized the taxonomy domains and subdomains to align with the Dewey Decimal Classifications

@github-actions github-actions bot added triage-needed (Auto labeled) skill is ready to be triaged skill (Auto labeled) knowledge (Auto labeled) labels Jun 28, 2024
Copy link

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

  • @instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
  • @instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
  • @instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.
  • @instructlab-bot help -- Print this help message again.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands.

Copy link
Contributor

@bjhargrave bjhargrave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

If we could, I would prefer to keep compositional_skills and knowledge folders free of readme files. That is, any change in these trees are a contribution to the taxonomy rather than some doc improvements. The current readme in knowledge is annoying in this way :-)

@mcorbin-ibm
Copy link
Contributor Author

@bjhargrave

If we could, I would prefer to keep compositional_skills and knowledge folders free of readme files. That is, any change in these trees are a contribution to the taxonomy rather than some doc improvements. The current readme in knowledge is annoying in this way :-)

Do you mean just the main parent folders (knowledge, compositional_skills, and foundational_skills)? OR, do you not want readme files in the domain/subdomain folders?

We do have a docs folder, where we could put most of the info for the repo, to keep the taxonomy tree folders clear?

I do still need to "document" the taxonomy tree -- which I was going to do in readme.txt files for the domains/subdomains??

@bjhargrave
Copy link
Contributor

Do you mean just the main parent folders (knowledge, compositional_skills, and foundational_skills)? OR, do you not want readme files in the domain/subdomain folders?

My request would to not have readme files anywhere under the taxonomy folders (knowledge, compositional_skills, and foundational_skills).

I do still need to "document" the taxonomy tree -- which I was going to do in readme.txt files for the domains/subdomains??

Agree but I would rather see that all together in a single readme file since that would give the reader a broad view over the taxonomy organization rather than walking around the tree encountering readme files occasionally.

@jjasghar jjasghar added triage-requested-changes skill has been reviewed; changes requested from contributor and removed triage-needed (Auto labeled) skill is ready to be triaged labels Jul 1, 2024
Copy link

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

  • @instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
  • @instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
  • @instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.
  • @instructlab-bot help -- Print this help message again.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands.

@github-actions github-actions bot added documentation Improvements or additions to documentation triage-needed (Auto labeled) skill is ready to be triaged labels Jul 1, 2024
Copy link

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

  • @instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
  • @instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
  • @instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.
  • @instructlab-bot help -- Print this help message again.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands.

@mcorbin-ibm
Copy link
Contributor Author

@jjasghar I have removed the readme files, and updated the main repo's readme file for these changes. There might be some additional changes to verify the qna.yaml files are either "grounded" or "ungrounded" and putting them in those subfolders. And, when we do start merging other knowledge contributions, we need to remember to add the document_type as the final node in the tree. I couldn't quickly find/identify any qna.yaml files to do that with. Please review my latest change here, and I'll let you do the honors of removing DRAFT! :)

@jjasghar jjasghar force-pushed the mcorbin/taxonomy branch from 082cfb2 to 62a19a4 Compare July 1, 2024 22:10
@jjasghar jjasghar changed the title DRAFT: taxonomy reorg per dewey decimal classifications Taxonomy reorg per dewey decimal classifications Jul 1, 2024
@jjasghar
Copy link
Member

jjasghar commented Jul 1, 2024

@bjhargrave can you confirm https://github.com/instructlab/taxonomy/actions/runs/9751722026/job/26913897281?pr=1215 that is the same thing as the /files directory?

@jjasghar jjasghar force-pushed the mcorbin/taxonomy branch from a8baeec to 67b13c4 Compare July 2, 2024 15:27
@makelinux
Copy link
Contributor

I recognize that unambiguous classification is an extremely complex task. Here are some of my thoughts:

  • Often, a topic belongs to multiple categories. For example, an electric battery can be classified under chemistry from a production standpoint, physics based on its function, and electronics based on its usage.
  • In classification, it is essential to determine which categories are top-level and which are subcategories.
  • Looking at the top categories of the Dewey Decimal Classification (DDC), in my opinion, 'science' is merely a form of knowledge and should not be a top-level category.
  • Consider trying to classify cooking and culinary in DDC. I couldn’t find a suitable category. Can you? Arts and recreation?
  • Generally, it seems to me that DDC is more suited to a scientific and academic perspective on printed books and less applicable to all forms of knowledge.

@mcorbin-ibm
Copy link
Contributor Author

@makelinux

I recognize that unambiguous classification is an extremely complex task.
Indeed! :)

  • Often, a topic belongs to multiple categories. For example, an electric battery can be classified under chemistry from a production standpoint, physics based on its function, and electronics based on its usage.

Yes, you can always classify topics into different categories, and I think much will depend upon the specific knowledge being submitted. If the knowledge is talking about its function, then classifying it under physics might be best, but if the knowledge is talking about where batteries are used, then maybe it belongs in technology/electronics. I'm not sure that there is a way around this and it is just something that we will have to make a judgement call as to where a piece of knowledge belongs.

  • In classification, it is essential to determine which categories are top-level and which are subcategories.
  • Looking at the top categories of the Dewey Decimal Classification (DDC), in my opinion, 'science' is merely a form of knowledge and should not be a top-level category.
  • Generally, it seems to me that DDC is more suited to a scientific and academic perspective on printed books and less applicable to all forms of knowledge.

The DDC has been around for nearly 150 years, and is in its 20th edition. It has 10 top categories, subdivided in the 100s, subdivided again to the 1000s. Please see this summaries doc: https://www.oclc.org/content/dam/oclc/dewey/resources/summaries/deweysummaries.pdf. It is meant to be a standard classification of all forms of knowledge. And, we will certainly run into some cases where we will have to find a "best fit" for a knowledge, but starting from the top 10 categories and its 10 subcategories seemed to present the best starting point.

As a secondary source to help us with classification of knowledge, we can look to Wikipedia to see how/where they placed things or to get additional ideas: https://en.wikipedia.org/wiki/Wikipedia:Contents.

The InstructLab taxonomy will not be a direct 1:1 mapping of the DDC, but the starting point to finding a best fit for the topics of knowledge.

  • Consider trying to classify cooking and culinary in DDC. I couldn’t find a suitable category. Can you? Arts and recreation?

For cooking and "culinary arts" I would put them in technology/food_and_drink.

@jjasghar jjasghar marked this pull request as draft July 11, 2024 15:51
Copy link

This pull request has been automatically marked as stale because it has not had activity within 15 days. It will be automatically closed if no further activity occurs within 31 days.

@github-actions github-actions bot added the stale stale-bot has marked you as stale label Jul 27, 2024
Copy link

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

  • @instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
  • @instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
  • @instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.
  • @instructlab-bot help -- Print this help message again.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands.

@jjasghar jjasghar removed triage-needed (Auto labeled) skill is ready to be triaged triage-requested-changes skill has been reviewed; changes requested from contributor stale stale-bot has marked you as stale labels Aug 19, 2024
@jjasghar jjasghar marked this pull request as ready for review August 19, 2024 20:00
@github-actions github-actions bot added the triage-needed (Auto labeled) skill is ready to be triaged label Aug 20, 2024
@jjasghar jjasghar force-pushed the mcorbin/taxonomy branch 2 times, most recently from 77b376f to 83e2575 Compare August 20, 2024 18:34
@bjhargrave
Copy link
Contributor

I have force pushed changes which include making .gitignore files empty.

Please don't push any merge commits to the PR.

@jjasghar
Copy link
Member

I think it's ready to merge!

https://www.youtube.com/watch?v=NHiUQb5xg7A

- Reorganized the taxonomy domains and subdomains to align with the Dewey Decimal Classifications
- Update readme.md
- fixed lint error
- changed readme.md to readme.txt in two knowledge domains
- removed readme files;; edited repo readme file
- Edited the main readme file to represent the bulk of the taxonomy restructuring.  More work needs to happen on the main readme file and updating the docs, but this should do for now.

Signed-off-by: Michelle Corbin <corbinm@us.ibm.com>
Signed-off-by: BJ Hargrave <hargrave@us.ibm.com>
Co-Authored-By: JJ Asghar <awesome@ibm.com>
Co-Authored-By: Julia Denham <jdenham@redhat.com>
Co-Authored-By: Luke Inglis <luke.inglis@ibm.com>
Co-Authored-By: Kelly Brown <kelbrown@redhat.com>
Co-Authored-By: Olivia <ombuzek@us.ibm.com>
@juliadenham juliadenham merged commit 0250bf2 into instructlab:main Aug 21, 2024
6 checks passed
jjasghar added a commit that referenced this pull request Aug 22, 2024
Updated the diagram because of #1215.

/cc @mcorbin-ibm @juliadenham

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Co-authored-by: Costa Shulyupin <costa.shul@redhat.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Improvements or additions to documentation knowledge (Auto labeled) skill (Auto labeled) triage-needed (Auto labeled) skill is ready to be triaged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants