-
Notifications
You must be signed in to change notification settings - Fork 271
Add kmeans classifier function #1478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kmeans classifier function #1478
Conversation
docs/train_kmeans.md
Outdated
|
||
This function takes in a collection of training images and fits a patch-based kmeans cluster model for later use in classifying cluster assignment in a target image. | ||
|
||
**plantcv.learn.train_kmeans**(img_dir, K, out_path="./kmeansout.fit", prefix="", patch_size=10, sigma=5, sampling=None, seed=1, num_imgs=0, n_init=10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider changing argument to a lowercase k
? Python function and argument naming convention states that both should generally be lowercase and underscore separated.
Similarly, but outside the scope of just this function... consistency within a project is beneficial and there are already a small handful of functions that take a directory of things as an input. In the case of pcv.io.read_dataset
the argument is named source_path
, but in the case of pcv.transform.checkerboard_calib
the argument is named img_path
. In two other plantcv functions, pcv.transform.merge_images
, pcv.visualize.pixel_scatter_plot
, pcv.visualize.time_lapse_video
, and pcv.segment_image_series
uses slightly different logic where the arguments is paths_to_imgs
, paths_to_imgs
, img_list
, and imgs_paths
respectively and in all these cases, the user is expected to provide a list of filepaths rather than putting all input images into a single directory.
I personally think it's more user friendly to have the function take a directory rather than using pcv.io.read_dataset
prior to running functions that take more than one image as input. I also think img_dir
is succinct while descriptive but definitely open to discussion. @nfahlgren @maliagehan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely can change to lowercase k
, that was an oversight on my part.
Regarding the input format, I agree that consistency (especially in naming) is good, and that providing a path to a directory is more user friendly. However, I will just mention that the reason for changing to a list of image paths for pcv.transform.merge_images
was because that function was written for the minirhizotron images and since there might be hundreds of tubes with 2-7 images per tube, it seemed cumbersome to require users to first put each set of images into a separate directory. Maybe there's a way to include a "prefix" argument similarly to the one in train_kmeans
so that you can provide a directory to all images and then only include ones that contain a specified string? I don't know if that's necessarily more user-friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since several functions accept some type of directory or list of paths, we might want to spend some time working on consistency in a later PR.
The rationale for pcv.io.read_dataset
is that we have one consistent way to get filenames from a directory dataset, rather than duplicating the code in many different functions, with or without different features (e.g. filtering, sorting, etc.). pcv.io.read_dataset
currently allows filtering on a pattern and sorting (or not). Because it returns a list of file paths, one can do additional custom filtering if they need to. Functions that accept a directory are limited to what they are programmed to do and are not as flexible if the user needs something else, forcing them to rearrange the data instead.
add new functions predict_kmeans, mask_kmeans and learn.train_kmeans
Describe your changes
This PR adds a set of functions to build patch-based kmeans clustering models on a training set of images, use the model to classify a test image, and then build a combined mask from chosen clusters from the test image.
Type of update
Associated issues
This function closes issue #1461
Additional context
Eventually, the third function in this PR (that which builds the combined mask) will be replaced by an interactive function allowing the user to click on the categories to be included in the combined mask.
For the reviewer
See this page for instructions on how to review the pull request.
plantcv/mkdocs.yml
updating.md