Add kmeans classifier function #1478

k034b363 · 2024-03-15T21:16:03Z

Describe your changes
This PR adds a set of functions to build patch-based kmeans clustering models on a training set of images, use the model to classify a test image, and then build a combined mask from chosen clusters from the test image.

Type of update

New feature or feature enhancement
Update to documentation

Associated issues
This function closes issue #1461

Additional context
Eventually, the third function in this PR (that which builds the combined mask) will be replaced by an interactive function allowing the user to click on the categories to be included in the combined mask.

For the reviewer
See this page for instructions on how to review the pull request.

PR functionality reviewed in a Jupyter Notebook
All tests pass
Test coverage remains 100%
Documentation tested
New documentation pages added to plantcv/mkdocs.yml
Changes to function input/output signatures added to updating.md
Code reviewed
PR approved

…lysis

HaleySchuhl · 2024-03-22T20:46:06Z

docs/train_kmeans.md

+
+This function takes in a collection of training images and fits a patch-based kmeans cluster model for later use in classifying cluster assignment in a target image. 
+
+**plantcv.learn.train_kmeans**(img_dir, K, out_path="./kmeansout.fit", prefix="", patch_size=10, sigma=5, sampling=None, seed=1, num_imgs=0, n_init=10)


Consider changing argument to a lowercase k? Python function and argument naming convention states that both should generally be lowercase and underscore separated.

Similarly, but outside the scope of just this function... consistency within a project is beneficial and there are already a small handful of functions that take a directory of things as an input. In the case of pcv.io.read_dataset the argument is named source_path, but in the case of pcv.transform.checkerboard_calib the argument is named img_path. In two other plantcv functions, pcv.transform.merge_images, pcv.visualize.pixel_scatter_plot, pcv.visualize.time_lapse_video, and pcv.segment_image_series uses slightly different logic where the arguments is paths_to_imgs , paths_to_imgs, img_list , and imgs_paths respectively and in all these cases, the user is expected to provide a list of filepaths rather than putting all input images into a single directory.

I personally think it's more user friendly to have the function take a directory rather than using pcv.io.read_dataset prior to running functions that take more than one image as input. I also think img_dir is succinct while descriptive but definitely open to discussion. @nfahlgren @maliagehan

Definitely can change to lowercase k, that was an oversight on my part.

Regarding the input format, I agree that consistency (especially in naming) is good, and that providing a path to a directory is more user friendly. However, I will just mention that the reason for changing to a list of image paths for pcv.transform.merge_images was because that function was written for the minirhizotron images and since there might be hundreds of tubes with 2-7 images per tube, it seemed cumbersome to require users to first put each set of images into a separate directory. Maybe there's a way to include a "prefix" argument similarly to the one in train_kmeans so that you can provide a directory to all images and then only include ones that contain a specified string? I don't know if that's necessarily more user-friendly.

Since several functions accept some type of directory or list of paths, we might want to spend some time working on consistency in a later PR.

The rationale for pcv.io.read_dataset is that we have one consistent way to get filenames from a directory dataset, rather than duplicating the code in many different functions, with or without different features (e.g. filtering, sorting, etc.). pcv.io.read_dataset currently allows filtering on a pattern and sorting (or not). Because it returns a list of file paths, one can do additional custom filtering if they need to. Functions that accept a directory are limited to what they are programmed to do and are not as flexible if the user needs something else, forcing them to rearrange the data instead.

add new functions predict_kmeans, mask_kmeans and learn.train_kmeans

…lysis

k034b363 added 27 commits February 29, 2024 16:50

added train_kmeans.py

78ef41e

Modified init files

40ec13a

Added outputs to masking function

52d5ee4

Added docs

47e3a95

Added tests for kmeans train

3b6b2ce

Added patch_size default to mask_kmeans

5009569

Added tests for kmeans_classifier

255e031

Fixed indent on docs

15168d1

Fix import in kmeans train test

32d455f

Fix directory in kmeans classifier test

6f04326

Fix path in kmeans training test

f3b405a

Try to fix unicode error in kmeans train test

c1438f0

Fixed another test path issue

813ff66

fix test typos

9b5dd95

add function open to joblib.dump call

07b461e

change write to bytes for pickle dump

470e64b

fix test file path

9383c6e

added file open to joblib load

ebe3bf4

changed test to assert file exists

857655c

added '_' for unused readimage vars in test

e965e34

changed ambigious asserts

c5fcc69

changed ambiguous asserts to .all()

3d03e30

fix tests

4408c75

fix classifier test

d85d0ef

Clean up classifier test after many "fixes"

5484911

Fixed "if" to preserve coverage

6fa6568

Fix even/odd patch size reshaping

4c84632

k034b363 added new feature New feature ideas and solutions work in progress Mark work in progress labels Mar 15, 2024

k034b363 added this to the PlantCV v4.x milestone Mar 15, 2024

k034b363 reopened this Mar 18, 2024

k034b363 added 10 commits March 18, 2024 09:10

Merge branch 'main' into 1461-kmeans-segmentation-functions-patch-ana…

644f3b9

…lysis

Fixed documentation images

759ab77

Added leaf example to test data

b676259

correct file extension

5a2086f

fix test patch_size

0d6af87

Fixed deepsource issues

f3b18f4

Fix more deepsource issues

01a14b4

Fix more deepsource issues

b4212af

Reverting to fix build fail

450f6cb

Try to fix cyclical import

f5866ba

k034b363 added ready to review and removed work in progress Mark work in progress labels Mar 18, 2024

HaleySchuhl self-requested a review March 22, 2024 14:39

HaleySchuhl assigned k034b363 Mar 22, 2024

Merge branch 'main' into 1461-kmeans-segmentation-functions-patch-ana…

c0c1636

…lysis

HaleySchuhl reviewed Mar 22, 2024

View reviewed changes

k034b363 and others added 6 commits March 25, 2024 09:10

Change to lowercase k

c9d9fa7

Change docs to lowercase k

f3f0bf6

code example syntax and link to other doc page

f26de33

Update updating.md

8ae313e

add new functions predict_kmeans, mask_kmeans and learn.train_kmeans

link to other doc page

7b80492

unrelated but fix broken link in color card docs

09b4a34

HaleySchuhl approved these changes Mar 29, 2024

View reviewed changes

nfahlgren added 2 commits April 16, 2024 11:51

Merge branch 'main' into 1461-kmeans-segmentation-functions-patch-ana…

fda503e

…lysis

Remove trailing whitespace

186e190

nfahlgren merged commit c382d06 into main Apr 17, 2024

nfahlgren deleted the 1461-kmeans-segmentation-functions-patch-analysis branch April 17, 2024 12:32

nfahlgren modified the milestones: PlantCV v4.x, PlantCV v4.3 Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add kmeans classifier function #1478

Add kmeans classifier function #1478

Uh oh!

k034b363 commented Mar 15, 2024 •

edited by HaleySchuhl

Loading

Uh oh!

HaleySchuhl Mar 22, 2024

Uh oh!

k034b363 Mar 25, 2024

Uh oh!

nfahlgren Apr 17, 2024

Uh oh!

Uh oh!


		This function takes in a collection of training images and fits a patch-based kmeans cluster model for later use in classifying cluster assignment in a target image.

		plantcv.learn.train_kmeans(img_dir, K, out_path="./kmeansout.fit", prefix="", patch_size=10, sigma=5, sampling=None, seed=1, num_imgs=0, n_init=10)

Add kmeans classifier function #1478

Add kmeans classifier function #1478

Uh oh!

Conversation

k034b363 commented Mar 15, 2024 • edited by HaleySchuhl Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HaleySchuhl Mar 22, 2024

Choose a reason for hiding this comment

Uh oh!

k034b363 Mar 25, 2024

Choose a reason for hiding this comment

Uh oh!

nfahlgren Apr 17, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

k034b363 commented Mar 15, 2024 •

edited by HaleySchuhl

Loading