Skip to content

Conversation

alex-l-kong
Copy link
Contributor

@alex-l-kong alex-l-kong commented Jan 19, 2023

What is the purpose of this PR?

Closes #850. Closes #881. The existing Pixie pipeline will need some modifications to support generic cell clustering.

How did you implement your changes

Main change is the addition of a preprocessing step outside of train_cell_som. This will help generate the cell clustering specific files (SOM input data based on pixel clusters and weighted cell channel expression) independent of the eventual notebook for generic cell clustering.

Additionally, it's better to pass the full list of expression columns to use for cell clustering as opposed to inferring them from a pixel cluster prefix. This will also make integration with generic cell clustering easier.

@alex-l-kong alex-l-kong self-assigned this Jan 19, 2023
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@alex-l-kong alex-l-kong changed the title Preprocess pixie cell Integrate existing Pixie cell clustering process with generic cell clustering process Jan 19, 2023
@alex-l-kong
Copy link
Contributor Author

Tested that the updated pipeline works. The following changes have been made:

  1. Preprocess cluster_counts_size_norm and weighted_cell_channel prior to train_cell_som
  2. Update the parameters accordingly
  3. Pass the full list of expression columns to use for SOM training (as opposed to inferring from pixel_cluster_col_prefix)

Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, will wait for Candace to weigh in.

After she's happy with this, I think opening a separate branch off of this one makes sense to begin constructing the notebook 3b. It will probably be the case that as you put that notebook together you realize that additional things may need to change

Copy link
Contributor

@cliu72 cliu72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

* First draft of generic cell clustering process

* Ensure the None checks run properly for weighted cell channel validation

* Attempt to add tests to 3b notebook

* Ensure generic cell clustering integrated with notebooks

* Add description of generalized cell inputs to data_types.md, and ensure generalized inputs run without cell_size col

* Add generic cell clustering process to the README

* Remove 3b notebook from README for now

* Patch up remaining errors

* Move cell cluster summary file generation to separate functions

* Remove extraneous comments from notebook

* Relocate segmentation variable settings to end of generic cell clustering notebook

* Remove test for column mismatch in cluster_cells (needed to support generic cell clustering)

* Massive renaming to avoid confusion of counts referring to generic cell clustering

* PYCODESTYLE

* Normalize new naming conventions; remove more counts refs

* Fix parameter to test_generate_wc_avg_files

* Update cell cluster pipeline to save Pixie results at the end, and not at intermediate steps

* Remove comments from notebook 3

* Rename averaging functions to avoid ambiguity
@alex-l-kong
Copy link
Contributor Author

We will definitely need to open a separate branch off of this to address the case where the user wants to re-run generic cell clustering with new columns specified. This is a similar issue that #903 addressed

@alex-l-kong
Copy link
Contributor Author

Per @cliu72 and @ngreenwald I've renamed the notebook so it doesn't interfere with any reviewers. We'll merge this in and open a PR for any issues that do pop up in the future.

Copy link
Member

@ngreenwald ngreenwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JLrumberger will you try out the new notebook with the network output from a subset of the TONIC data to see how it looks? And also see if any issues come up? You can set up a time to meet with @alex-l-kong to go over it and make sure everything looks good.

Copy link
Contributor

@cliu72 cliu72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Just one small comment thing.

@alex-l-kong
Copy link
Contributor Author

@ngreenwald per our meeting today, we'll get this merged in for the time being, then have @JLrumberger open up any subsequent issues that come up with his runs.

@alex-l-kong alex-l-kong merged commit 3a24d44 into main Feb 17, 2023
@alex-l-kong alex-l-kong deleted the preprocess_pixie_cell branch February 17, 2023 06:27
@srivarra srivarra added the enhancement New feature or request label Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure cell clustering process can be run with a variety of cluster types Abstract Pixie to work with different inputs
4 participants