-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Hello! Thank you for creating chopper for us. However, I noticed when I was trying to remove DCS reads from my fastq files that a good portion of contaminating reads still remain. This is an example of one read blasted against the DCS sequence.
(query) bad read: 3,800 bp (90% =3,420bp)
(target) DCS: 3,560 bp
Chopper left these reads, so I decided to manually run minimap2 -ax map-ont DCS.fasta read.fq
to see the PAF results. My "match_len" was 3,510bp. Please correct me if I'm misinterpreting the filter function, but I assume because 3,510bp > 3,420bp it should be classified as a contaminate.
Alternatively if i run minimap2 -x map-ont DCS.fasta read.fq
my "match_len" was 3,268bp. Because it is not greater than 3,420bp the read would be retained. Could chopper be inaccurately reporting the lengths because the Aligner setup in lines: 178-184 is missing ".with_cigar()"? lh3/minimap2#158
fn setup_contamination_filter(contam_fasta: &str) -> Aligner {
Aligner::builder()
.with_threads(8)
.map_ont()
.with_index(contam_fasta, None)
.expect("Unable to build index")
}