Skip to content

Conversation

davidbenjamin
Copy link
Contributor

@LeeTL1220 Small change for Permutect data here, usually irrelevant. In most use cases, it will be possible to train with a GIAB truth VCF to label data. When there is not, this PR allows users to optionally specify a PoN as a hint that unlabeled data are probably actually artifacts.

This should not in any way be interpreted as Permutect using a PoN for variant calling. It has zero effect on our DREAM and Linseq evaluations, which use an NA12878 sample for training.

Copy link
Contributor

@LeeTL1220 LeeTL1220 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you address these comments, then feel free to count this as an approval. No need for me to re-review.

@davidbenjamin davidbenjamin merged commit fdd4333 into master Apr 4, 2025
20 checks passed
@davidbenjamin davidbenjamin deleted the db_permutect_training_blacklist branch April 4, 2025 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants