Skip to content

--minFraction too stringent for fragmented genomes? #70

@wwood

Description

@wwood

Hi, thanks a report by @apcamargo at wwood/galah#7 I came across an issue with --minFraction on these fragmented genomes. They seem to align well:

$ fastANI -q a1.fna -r 2.fna -o /dev/stdout --minFraction 0.2 2>/dev/null
1.fna	2.fna	97.4762	228	629
$ fastANI -r 1.fna -q 29.fna -o /dev/stdout 2>/dev/null
2.fna	1.fna	98.351	232	255

But when --minFraction is used the hit goes away. This is even though 232/255 > 0.5:

$ fastANI -q 1.fna -r 2.fna -o /dev/stdout --minFraction 0.5 2>/dev/null
$ fastANI -r 1.fna -q 2.fna -o /dev/stdout --minFraction 0.5 2>/dev/null
$ fastANI -q 1.fna -r 2.fna -o /dev/stdout  --minFraction 0.5 --fragLen 1000 2>/dev/null
1.fna	2.fna	98.2643	1113	2276

(galah-dev) ben@u2:~/git/galah$ fastANI --version
version 1.31
(galah-dev) ben@u2:~/git/galah/antonio$ seqstat 1.fna
seqstat - show some simple statistics on a sequence file
SQUID 1.9g (January 2003)
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
Freely distributed under the GNU General Public License (GPL)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Format:              FASTA
Type (of 1st seq):   DNA
Number of sequences: 369
Total # residues:    2456354
Smallest:            2028
Largest:             32450
Average length:      6656.8
(galah-dev) ben@u2:~/git/galah/antonio$ seqstat 2.fna
seqstat - show some simple statistics on a sequence file
SQUID 1.9g (January 2003)
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
Freely distributed under the GNU General Public License (GPL)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Format:              FASTA
Type (of 1st seq):   DNA
Number of sequences: 409
Total # residues:    1468919
Smallest:            2005
Largest:             16940
Average length:      3591.5

Filtering out this alignment by the minFraction seems incorrect to me. I wonder what the definition of the minFraction actually is. Is it the fraction of the total genome length or the fraction of the genome that is long enough to be included as a fragment, or something along those lines?

Thanks, ben

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions