Skip to content

utgcns segfault at consensus step #247

@galicae

Description

@galicae

(possibly similar to #53 or Canu issues #1061 and #1073).

Expected behaviour

rule generateConsensus runs through and produces a final consensus fasta.

Current behaviour

Error in rule. The log file in the .snakemake/log/ dir says:

log
Error in rule generateConsensus:
    jobid: 246
    input: 7-consensus/packages/part023.cnspack, 7-consensus/packages.tigName_to_ID.map, 7-consensus/packages.report, 7-consensus/ont_subset.id
    output: 7-consensus/packages/part023.fasta
    log: 7-consensus/packages/part023.err (check log file(s) for error details)
    shell:

cd 7-consensus

mkdir -p packages

cat > ./packages/part023.sh <<EOF
#!/bin/sh
set -e

/lisc/user/papadopoulos/.conda/envs/verkko/lib/verkko/bin/utgcns \\
    -threads 8 \\
    -import ../7-consensus/packages/part023.cnspack \\
    -A ../7-consensus/packages/part023.fasta.WORKING \\
    -C 2 -norealign \\
    -maxcoverage 50 \\
    -e  0.05 \\
    -em 0.20 \\
    -EM 0 \\
    -l 3000 \\
    -edlib \\
&& \\
mv ../7-consensus/packages/part023.fasta.WORKING ../7-consensus/packages/part023.fasta \\
&& \\
exit 0

echo ""
echo "Consensus did not finish successfully, exit code \$?."

echo ""
echo "Files in current directory:"
ls -ltr

echo ""
echo "Files in packages/:"
ls -ltr packages

exit 1
EOF

chmod +x ./packages/part023.sh

./packages/part023.sh > ../7-consensus/packages/part023.err 2>&1

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: 1307631

When I look at the offending part, this is what I see:

part023.err
-- Opening output FASTA file '../7-consensus/packages/part023.fasta.WORKING'.
--
-- Computing consensus for b=0 to e=4294967295 with errorRate 0.0500 (max 0.2000) and minimum overlap 3000
--
                           ----------CONTAINED READS----------  -DOVETAIL  READS-
  tigID    length   reads      used coverage  ignored coverage      used coverage
------- --------- -------  -------- -------- -------- --------  -------- --------
     17     26765      10         3    0.47x        0    0.00x         7    1.93x
     30     26057      11         5    1.05x        0    0.00x         6    1.91x
     40     30194      10         5    1.35x        0    0.00x         5    2.33x
     55     22888      14         8    2.38x        0    0.00x         6    2.15x
     95     28299      10         6    1.20x        0    0.00x         4    1.65x
    104     20451      15         9    2.37x        0    0.00x         6    2.55x
    110     23410      13         6    1.30x        0    0.00x         7    2.42x
    112     21371      12         9    2.24x        0    0.00x         3    1.91x
    115     21734      12         5    0.85x        0    0.00x         7    1.80x
    136     23121      13         7    1.62x        0    0.00x         6    2.26x
    223     16225      17        12    4.51x        0    0.00x         5    2.95x
    244     27242      11         4    1.00x        0    0.00x         7    2.49x
    247     23142      13        10    2.91x        0    0.00x         3    1.36x
    266     24272      12         5    0.77x        0    0.00x         7    2.35x
    272     34295       9         5    1.22x        0    0.00x         4    1.71x
    296     33650       8         3    0.46x        0    0.00x         5    1.47x
    324     29357      11         6    1.59x        0    0.00x         5    2.20x

(many more of these with the occasional warning...)

  19317     18452      15        13    5.30x        0    0.00x         2    1.75x
  19399     26921      10         5    1.18x        0    0.00x         5    1.62x
  19417     15140      20        16    5.99x        0    0.00x         4    2.45x
  19514     17712      15        12    5.08x        0    0.00x         3    1.61x
  19552     20374      13         8    2.71x        0    0.00x         5    2.25x
  19557     22000      14         8    2.15x        0    0.00x         6    alignEdLib()-- WARNING: tigbgn 21978 > tigend 21683 - tiglen 21683 utgpos 22228-
27617 padding 250
alignEdLib()-- WARNING: updated tigbgn 0 > tigend 27617 - tiglen 21683 utgpos 22228-27617 padding 250
alignEdLib()-- WARNING: tigbgn 27874 > tigend 23534 - tiglen 23534 utgpos 28119-33009 padding 245
alignEdLib()-- WARNING: updated tigbgn 0 > tigend 33009 - tiglen 23534 utgpos 28119-33009 padding 245
utgcns: utility/src/align/edlib.C:438: void merylutil::align::edlib::v1::edlibAlignmentToStrings(const unsigned char*, int, int, int, int, int, const char*,
 const char*, char*, char*): Assertion `strlen(qry_aln_str) == alignmentLength && strlen(tgt_aln_str) == alignmentLength' failed.

Failed with 'Aborted'; backtrace (libbacktrace):

Failed with 'Segmentation fault'; backtrace (libbacktrace):
./packages/part023.sh: line 18: 704541 Segmentation fault      (core dumped) /lisc/user/papadopoulos/.conda/envs/verkko/lib/verkko/bin/utgcns -threads 8 -im
port ../7-consensus/packages/part023.cnspack -A ../7-consensus/packages/part023.fasta.WORKING -C 2 -norealign -maxcoverage 50 -e 0.05 -em 0.20 -EM 6025 -l 3
000 -edlib

Consensus did not finish successfully, exit code 0.

Files in current directory:
(list of files)

Possible solutions

long sequences

In issue #53 it was suggested that a read longer than 2^11 might be present. However, the offending line in canu has been fixed, so this can't be it.

abnormal FASTA

some canu issues suggested that unexpected characters in the FASTA file might be a problem. I looked through part023.fasta.WORKING with a short python script and couldn't find anything:

code
found = False

with open("./part023.fasta.WORKING", "r") as fasta:
    for line in fasta:
        if line.startswith(">"):
            continue
        else:
            for c in line:
                if c not in ["A", "C", "G", "T", "\n"]:
                    found = True
        if found is True:
            print(line)
            break

This runs through without any output. Manual inspection of (parts of) the FASTA file also did not show anything suspicious.

Steps to reproduce

I uploaded part023.csnpack and part023.fasta.WORKING on Dropbox.

Environment

HPC environment running Oracle Linux 9 (kernel release 5.14.0-362.24.1.0.1.el9_3.x86_64); verkko installation via conda. The verkko call was put in a wrapper script and ran on the head node of the cluster in --slurm mode.

verkko script
#!/usr/bin/env bash

module load conda
conda activate verkko

# long read input - ONT
NANOPORE="/path/to/nanopore.fastq.gz"
# long read input - PacBio CCS
PACBIO="/path/to/ccs.fastq.gz"

# output directory
OUTPUT="/path/to/verkko/output/"
mkdir -p "$OUTPUT" || exit 1

verkko -d $OUTPUT --hifi $PACBIO --nano $NANOPORE --no-correction --local-cpus 1 --slurm --snakeopts "--cores 64"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions