Skip to content

Conversation

k034b363
Copy link
Contributor

@k034b363 k034b363 commented Mar 15, 2024

Describe your changes
This PR adds a set of functions to build patch-based kmeans clustering models on a training set of images, use the model to classify a test image, and then build a combined mask from chosen clusters from the test image.

Type of update

  • New feature or feature enhancement
  • Update to documentation

Associated issues
This function closes issue #1461

Additional context
Eventually, the third function in this PR (that which builds the combined mask) will be replaced by an interactive function allowing the user to click on the categories to be included in the combined mask.

For the reviewer
See this page for instructions on how to review the pull request.

  • PR functionality reviewed in a Jupyter Notebook
  • All tests pass
  • Test coverage remains 100%
  • Documentation tested
  • New documentation pages added to plantcv/mkdocs.yml
  • Changes to function input/output signatures added to updating.md
  • Code reviewed
  • PR approved

@k034b363 k034b363 added new feature New feature ideas and solutions work in progress Mark work in progress labels Mar 15, 2024
@k034b363 k034b363 added this to the PlantCV v4.x milestone Mar 15, 2024
@k034b363 k034b363 reopened this Mar 18, 2024
@k034b363 k034b363 added ready to review and removed work in progress Mark work in progress labels Mar 18, 2024
@HaleySchuhl HaleySchuhl self-requested a review March 22, 2024 14:39

This function takes in a collection of training images and fits a patch-based kmeans cluster model for later use in classifying cluster assignment in a target image.

**plantcv.learn.train_kmeans**(img_dir, K, out_path="./kmeansout.fit", prefix="", patch_size=10, sigma=5, sampling=None, seed=1, num_imgs=0, n_init=10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing argument to a lowercase k? Python function and argument naming convention states that both should generally be lowercase and underscore separated.

Similarly, but outside the scope of just this function... consistency within a project is beneficial and there are already a small handful of functions that take a directory of things as an input. In the case of pcv.io.read_dataset the argument is named source_path, but in the case of pcv.transform.checkerboard_calib the argument is named img_path. In two other plantcv functions, pcv.transform.merge_images, pcv.visualize.pixel_scatter_plot, pcv.visualize.time_lapse_video, and pcv.segment_image_series uses slightly different logic where the arguments is paths_to_imgs , paths_to_imgs, img_list , and imgs_paths respectively and in all these cases, the user is expected to provide a list of filepaths rather than putting all input images into a single directory.

I personally think it's more user friendly to have the function take a directory rather than using pcv.io.read_dataset prior to running functions that take more than one image as input. I also think img_dir is succinct while descriptive but definitely open to discussion. @nfahlgren @maliagehan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely can change to lowercase k, that was an oversight on my part.

Regarding the input format, I agree that consistency (especially in naming) is good, and that providing a path to a directory is more user friendly. However, I will just mention that the reason for changing to a list of image paths for pcv.transform.merge_images was because that function was written for the minirhizotron images and since there might be hundreds of tubes with 2-7 images per tube, it seemed cumbersome to require users to first put each set of images into a separate directory. Maybe there's a way to include a "prefix" argument similarly to the one in train_kmeans so that you can provide a directory to all images and then only include ones that contain a specified string? I don't know if that's necessarily more user-friendly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since several functions accept some type of directory or list of paths, we might want to spend some time working on consistency in a later PR.

The rationale for pcv.io.read_dataset is that we have one consistent way to get filenames from a directory dataset, rather than duplicating the code in many different functions, with or without different features (e.g. filtering, sorting, etc.). pcv.io.read_dataset currently allows filtering on a pattern and sorting (or not). Because it returns a list of file paths, one can do additional custom filtering if they need to. Functions that accept a directory are limited to what they are programmed to do and are not as flexible if the user needs something else, forcing them to rearrange the data instead.

@nfahlgren nfahlgren merged commit c382d06 into main Apr 17, 2024
@nfahlgren nfahlgren deleted the 1461-kmeans-segmentation-functions-patch-analysis branch April 17, 2024 12:32
@nfahlgren nfahlgren modified the milestones: PlantCV v4.x, PlantCV v4.3 Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature ideas and solutions ready to review
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Kmeans Segmentation Functions / Patch Analysis
3 participants