-
Notifications
You must be signed in to change notification settings - Fork 30
Don't require user to have all pixel SOM clusters represented during assignment #1124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…o not require all SOM clusters to be represented
@alex-l-kong I don't think this is the best solution to the problem. The reason that this check is even in there in the first place is because when we have cohorts of like 2000 FOVs, it would take forever to generate the average file. So we decided to use a subset of all FOVs to generate the average file (the thought is that a subset is a representative sample of all clusters). However, when we take a subset of all FOVs, it's possible that some rare clusters are not included in that subset. For example, if there is a cluster that is in like 1000 FOVs, but not in the 100 subsetted FOVs, that cluster would not be in the average file. This would lead to downstream problems, since that cluster number does exist in some FOVs, but it didn't get a metacluster assignment. Therefore, we thought that adding a test to make sure all clusters are represented would be a good way to get around this issue. If not all clusters are represented in the average file, then the number of FOVs subsetted needs to be increased. The issue arising now is that if we specify 100 SOM clusters, maybe only 98 SOM clusters are actually assigned across ALL FOVs. So in the average file, there are 98 SOM clusters. This is a different situation than if there are truly 100 SOM clusters assigned across all FOVs, but only 98 of them are in the subset of FOVs used for the average file generation. So the fundamental issue is that when we are checking for the number of clusters, we are using the weights file. However, the weights file can specify 100 clusters, when across all FOVs there are only 98 clusters. That's why the solution that I suggested in the issue is to keep track of the total number of clusters during the mapping step. Open to other suggestions though. |
@cliu72 thanks, I just realized that on my end too and am in the process of modifying it. |
@cliu72 I still think the |
@alex-l-kong I think it's fine to have this as a temporary solution, but if we track the total number of SOM clusters seen and use that to make sure the average file is written correctly, this error should never come up. If it does, there is another problem. |
Ok I did a quick test by removing one SOM cluster, I think this does the trick now. |
…ly being written, then don't attempt to drop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to properly stress test it because I didn't have a dataset easily on-hand that returned less than the desired number of clusters, but the logic makes sense to me.
What is the purpose of this PR?
During SOM clustering, it is possible in some cases that not all unique clusters discovered during training will be present in the dataset. In these cases, it's too strict for the SOM cluster mapping function to require all the SOM clusters to be represented.
How did you implement your changes
Add an additional parameter to the SOM cluster assignment function that is defaulted to not require all the SOM clusters to be present. This can be explicitly set to revert back to the original format.
The underlying averaging function is also adjusted accordingly.