Skip to content

Conversation

alex-l-kong
Copy link
Contributor

What is the purpose of this PR?

Closes #992. Because cell clustering now takes a considerable amount of time, chance of intermediate failure also increases. There is no current way to recover intermediate cell training data if this happens, so this needs to be added.

How did you implement your changes

  1. Add intermediate saving of the cell training data in the cell clustering notebook. This allows any data, along with any cluster labels, to be loaded in on restart.
  2. Add overwrite functionality to the cell SOM assignment and cell consensus clustering functions. This will allow the user to redo these if they use different parameters.

@alex-l-kong alex-l-kong self-assigned this May 17, 2023
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@alex-l-kong alex-l-kong marked this pull request as draft May 17, 2023 21:24
@alex-l-kong alex-l-kong marked this pull request as ready for review May 26, 2023 17:50
@@ -297,7 +297,9 @@
"cell_type": "markdown",
Copy link
Contributor

@srivarra srivarra May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #15.            compression='uncompressed'

I looked at an issue you investigated a while back, #860 about feather alternatives. It seemed that you concluded that feather files are pretty solid, and they have efficient compression, but they it seems to lack on the actual storage savings side of things. Do you think it may be worth revisiting the compression aspect to see if zstd on the scale of a cohort would be beneficial?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely worth looking into in the near future.

Copy link
Contributor

@srivarra srivarra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a question if compressing feather files yields anything useful.

Copy link
Contributor

@camisowers camisowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@alex-l-kong alex-l-kong requested a review from cliu72 May 30, 2023 18:14
@alex-l-kong alex-l-kong requested a review from cliu72 June 3, 2023 02:31
Copy link
Contributor

@cliu72 cliu72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@alex-l-kong alex-l-kong added this pull request to the merge queue Jun 6, 2023
Merged via the queue into main with commit a49528b Jun 6, 2023
@alex-l-kong alex-l-kong deleted the pixie_cell_save branch June 6, 2023 18:53
@srivarra srivarra added the bug Something isn't working label Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Save cluster_counts_size_norm as it is being updated
4 participants