-
Notifications
You must be signed in to change notification settings - Fork 30
Integrate existing Pixie cell clustering process with generic cell clustering process #885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Tested that the updated pipeline works. The following changes have been made:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, will wait for Candace to weigh in.
After she's happy with this, I think opening a separate branch off of this one makes sense to begin constructing the notebook 3b. It will probably be the case that as you put that notebook together you realize that additional things may need to change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
* First draft of generic cell clustering process * Ensure the None checks run properly for weighted cell channel validation * Attempt to add tests to 3b notebook * Ensure generic cell clustering integrated with notebooks * Add description of generalized cell inputs to data_types.md, and ensure generalized inputs run without cell_size col * Add generic cell clustering process to the README * Remove 3b notebook from README for now * Patch up remaining errors * Move cell cluster summary file generation to separate functions * Remove extraneous comments from notebook * Relocate segmentation variable settings to end of generic cell clustering notebook * Remove test for column mismatch in cluster_cells (needed to support generic cell clustering) * Massive renaming to avoid confusion of counts referring to generic cell clustering * PYCODESTYLE * Normalize new naming conventions; remove more counts refs * Fix parameter to test_generate_wc_avg_files * Update cell cluster pipeline to save Pixie results at the end, and not at intermediate steps * Remove comments from notebook 3 * Rename averaging functions to avoid ambiguity
We will definitely need to open a separate branch off of this to address the case where the user wants to re-run generic cell clustering with new columns specified. This is a similar issue that #903 addressed |
Per @cliu72 and @ngreenwald I've renamed the notebook so it doesn't interfere with any reviewers. We'll merge this in and open a PR for any issues that do pop up in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JLrumberger will you try out the new notebook with the network output from a subset of the TONIC data to see how it looks? And also see if any issues come up? You can set up a time to meet with @alex-l-kong to go over it and make sure everything looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Just one small comment thing.
… preprocess_pixie_cell
@ngreenwald per our meeting today, we'll get this merged in for the time being, then have @JLrumberger open up any subsequent issues that come up with his runs. |
What is the purpose of this PR?
Closes #850. Closes #881. The existing Pixie pipeline will need some modifications to support generic cell clustering.
How did you implement your changes
Main change is the addition of a preprocessing step outside of
train_cell_som
. This will help generate the cell clustering specific files (SOM input data based on pixel clusters and weighted cell channel expression) independent of the eventual notebook for generic cell clustering.Additionally, it's better to pass the full list of expression columns to use for cell clustering as opposed to inferring them from a pixel cluster prefix. This will also make integration with generic cell clustering easier.