Skip to content

Minimap2 v2.26 Alters Alignment Behavior for Large Deletions #1111

@Donaim

Description

@Donaim

Hello,

We've recently updated from minimap2 version 2.17 to 2.26 and have noticed a discrepancy in how it handles sequences containing large deletions. While our project, MiCall, aligns various types of sequences, this issue is particularly problematic for sequences with large deletions (~1200 bp).

Reproduction

We've written a Python test within our MiCall project to simulate this scenario. The test involves mutating a seed sequence and then cutting it in two parts, introducing a large deletion. Minimap2 is called through a Python wrapper in our pipeline, but the only option used is --preset=map-ont, which would be equivalent to running the following shell command:

minimap2 --preset=map-ont ref.fasta query.fastq > output.sam

Here's a relevant portion of the test, for clarity:

seed_name = 'HCV-1a'
seed_seq = projects.getReference(seed_name)
seed_seq = mutate_sequence(seq=seed_seq, rate=0.04)
consensus = seed_seq[290:983] + seed_seq[3000:9269]

# ... (Alignment logic here)

assert alignment == expected_alignment

The issue

With minimap2 v2.17, the test passes: two separate regions in the alignment are recognized, effectively capturing the large deletion. However, with v2.26, the software only recognizes the larger part of the sequence (seed_seq[3000:9269]) and ignores the smaller part (seed_seq[290:983]).

Interestingly, if the mutation rate is lowered or mutations are removed altogether, both versions start to align correctly.

Questions

  • Is this behavior of ignoring large deletions in the presence of mutations expected in the newer version?
  • Is there a way to configure v2.26 to behave like v2.17 for this specific case without downgrading?

Project context

This issue has arisen in the context of our project, MiCall, which is a pipeline for processing FASTQ data from an Illumina MiSeq to genotype human RNA viruses like HIV and hepatitis C. The specific test where this issue is observed can be found here.

We would greatly appreciate any insights or solutions to address this issue. Thank you in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions