Skip to content

allowable characters in the genome fasta ID? #54

@mcmahon-uw

Description

@mcmahon-uw

Hi!

I've been getting a consistent error with one batch of genome files when I run the "genome" command with a concatenated set of the genomes:

coverm genome \
    --reference ~/binned_dRepd/ \
    -s "-" \
    -m relative_abundance \
    --interleaved /home/GLBRCORG/trina.mcmahon/lake_data_general/Mendota_metaGs_renamed/*.fastq \
    --bam-file-cache-directory bam_cache \
    --min-read-aligned-percent 0.75 \
    --min-read-percent-identity 0.95 \
    --min-covered-fraction 0 \
    -x fasta -t 20 &> relative_abundance_output.txt &

The error is:

thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 4294967295', src/genome.rs:811:23
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

But if I run "genome" with the individual files, it works fine.

coverm genome \
    --genome-fasta-directory  ~/binned_dRepd/ \
    -m relative_abundance \
    --interleaved /home/GLBRCORG/trina.mcmahon/lake_data_general/Mendota_metaGs_renamed/*.fastq \
    --bam-file-cache-directory bam_cache \
    --min-read-aligned-percent 0.75 \
    --min-read-percent-identity 0.95 \
    --min-covered-fraction 0 \
    -x fa -t 20 &> relative_abundance_output.txt &

I am wondering if it is a formatting problem with my fasta ID (I've done some other debugging that makes me think this).
Are there any forbidden characters that we should not use to separate the genome name from the contig name? I suspect it's the "-" but want to confirm.

Example fasta ID:

ME_SAG_rebinned_metabat1_bin.1-2739367620_Contig_60

thanks,
trina

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions