-
Notifications
You must be signed in to change notification settings - Fork 94
Closed
Labels
Description
Hi guys,
Ran mash
on a collection of reference 16s rRNA sequences (3.6Gb) and got this warning while sketching:
WARNING: For the k-mer size used (21), the random match probability (0.000726951) is above
the specified warning threshold (0.01) for the sequence "..rdp/current_Bacteria_unaligned.fa" of
size 18446744072614073515. Distances to this sequence may be underestimated as a result.
To meet the threshold of 0.01, a k-mer size of at least 20 is required.
Seems like the message should not be displayed and/or an interger overflow happened when calculating the size of the reference.
My suspicion is that maybe there an assumption that the reference fasta will always contain a single sequence?