-
Notifications
You must be signed in to change notification settings - Fork 439
Description
Hello,
We've recently updated from minimap2 version 2.17
to 2.26
and have noticed a discrepancy in how it handles sequences containing large deletions. While our project, MiCall, aligns various types of sequences, this issue is particularly problematic for sequences with large deletions (~1200 bp).
Reproduction
We've written a Python test within our MiCall project to simulate this scenario. The test involves mutating a seed sequence and then cutting it in two parts, introducing a large deletion. Minimap2 is called through a Python wrapper in our pipeline, but the only option used is --preset=map-ont
, which would be equivalent to running the following shell command:
minimap2 --preset=map-ont ref.fasta query.fastq > output.sam
Here's a relevant portion of the test, for clarity:
seed_name = 'HCV-1a'
seed_seq = projects.getReference(seed_name)
seed_seq = mutate_sequence(seq=seed_seq, rate=0.04)
consensus = seed_seq[290:983] + seed_seq[3000:9269]
# ... (Alignment logic here)
assert alignment == expected_alignment
The issue
With minimap2 v2.17
, the test passes: two separate regions in the alignment are recognized, effectively capturing the large deletion. However, with v2.26
, the software only recognizes the larger part of the sequence (seed_seq[3000:9269]
) and ignores the smaller part (seed_seq[290:983]
).
Interestingly, if the mutation rate is lowered or mutations are removed altogether, both versions start to align correctly.
Questions
- Is this behavior of ignoring large deletions in the presence of mutations expected in the newer version?
- Is there a way to configure
v2.26
to behave likev2.17
for this specific case without downgrading?
Project context
This issue has arisen in the context of our project, MiCall, which is a pipeline for processing FASTQ data from an Illumina MiSeq to genotype human RNA viruses like HIV and hepatitis C. The specific test where this issue is observed can be found here.
We would greatly appreciate any insights or solutions to address this issue. Thank you in advance!