-
Notifications
You must be signed in to change notification settings - Fork 32
Description
(possibly similar to #53 or Canu issues #1061 and #1073).
Expected behaviour
rule generateConsensus
runs through and produces a final consensus fasta.
Current behaviour
Error in rule. The log file in the .snakemake/log/
dir says:
log
Error in rule generateConsensus:
jobid: 246
input: 7-consensus/packages/part023.cnspack, 7-consensus/packages.tigName_to_ID.map, 7-consensus/packages.report, 7-consensus/ont_subset.id
output: 7-consensus/packages/part023.fasta
log: 7-consensus/packages/part023.err (check log file(s) for error details)
shell:
cd 7-consensus
mkdir -p packages
cat > ./packages/part023.sh <<EOF
#!/bin/sh
set -e
/lisc/user/papadopoulos/.conda/envs/verkko/lib/verkko/bin/utgcns \\
-threads 8 \\
-import ../7-consensus/packages/part023.cnspack \\
-A ../7-consensus/packages/part023.fasta.WORKING \\
-C 2 -norealign \\
-maxcoverage 50 \\
-e 0.05 \\
-em 0.20 \\
-EM 0 \\
-l 3000 \\
-edlib \\
&& \\
mv ../7-consensus/packages/part023.fasta.WORKING ../7-consensus/packages/part023.fasta \\
&& \\
exit 0
echo ""
echo "Consensus did not finish successfully, exit code \$?."
echo ""
echo "Files in current directory:"
ls -ltr
echo ""
echo "Files in packages/:"
ls -ltr packages
exit 1
EOF
chmod +x ./packages/part023.sh
./packages/part023.sh > ../7-consensus/packages/part023.err 2>&1
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
cluster_jobid: 1307631
When I look at the offending part, this is what I see:
part023.err
-- Opening output FASTA file '../7-consensus/packages/part023.fasta.WORKING'.
--
-- Computing consensus for b=0 to e=4294967295 with errorRate 0.0500 (max 0.2000) and minimum overlap 3000
--
----------CONTAINED READS---------- -DOVETAIL READS-
tigID length reads used coverage ignored coverage used coverage
------- --------- ------- -------- -------- -------- -------- -------- --------
17 26765 10 3 0.47x 0 0.00x 7 1.93x
30 26057 11 5 1.05x 0 0.00x 6 1.91x
40 30194 10 5 1.35x 0 0.00x 5 2.33x
55 22888 14 8 2.38x 0 0.00x 6 2.15x
95 28299 10 6 1.20x 0 0.00x 4 1.65x
104 20451 15 9 2.37x 0 0.00x 6 2.55x
110 23410 13 6 1.30x 0 0.00x 7 2.42x
112 21371 12 9 2.24x 0 0.00x 3 1.91x
115 21734 12 5 0.85x 0 0.00x 7 1.80x
136 23121 13 7 1.62x 0 0.00x 6 2.26x
223 16225 17 12 4.51x 0 0.00x 5 2.95x
244 27242 11 4 1.00x 0 0.00x 7 2.49x
247 23142 13 10 2.91x 0 0.00x 3 1.36x
266 24272 12 5 0.77x 0 0.00x 7 2.35x
272 34295 9 5 1.22x 0 0.00x 4 1.71x
296 33650 8 3 0.46x 0 0.00x 5 1.47x
324 29357 11 6 1.59x 0 0.00x 5 2.20x
(many more of these with the occasional warning...)
19317 18452 15 13 5.30x 0 0.00x 2 1.75x
19399 26921 10 5 1.18x 0 0.00x 5 1.62x
19417 15140 20 16 5.99x 0 0.00x 4 2.45x
19514 17712 15 12 5.08x 0 0.00x 3 1.61x
19552 20374 13 8 2.71x 0 0.00x 5 2.25x
19557 22000 14 8 2.15x 0 0.00x 6 alignEdLib()-- WARNING: tigbgn 21978 > tigend 21683 - tiglen 21683 utgpos 22228-
27617 padding 250
alignEdLib()-- WARNING: updated tigbgn 0 > tigend 27617 - tiglen 21683 utgpos 22228-27617 padding 250
alignEdLib()-- WARNING: tigbgn 27874 > tigend 23534 - tiglen 23534 utgpos 28119-33009 padding 245
alignEdLib()-- WARNING: updated tigbgn 0 > tigend 33009 - tiglen 23534 utgpos 28119-33009 padding 245
utgcns: utility/src/align/edlib.C:438: void merylutil::align::edlib::v1::edlibAlignmentToStrings(const unsigned char*, int, int, int, int, int, const char*,
const char*, char*, char*): Assertion `strlen(qry_aln_str) == alignmentLength && strlen(tgt_aln_str) == alignmentLength' failed.
Failed with 'Aborted'; backtrace (libbacktrace):
Failed with 'Segmentation fault'; backtrace (libbacktrace):
./packages/part023.sh: line 18: 704541 Segmentation fault (core dumped) /lisc/user/papadopoulos/.conda/envs/verkko/lib/verkko/bin/utgcns -threads 8 -im
port ../7-consensus/packages/part023.cnspack -A ../7-consensus/packages/part023.fasta.WORKING -C 2 -norealign -maxcoverage 50 -e 0.05 -em 0.20 -EM 6025 -l 3
000 -edlib
Consensus did not finish successfully, exit code 0.
Files in current directory:
(list of files)
Possible solutions
long sequences
In issue #53 it was suggested that a read longer than 2^11 might be present. However, the offending line in canu has been fixed, so this can't be it.
abnormal FASTA
some canu issues suggested that unexpected characters in the FASTA file might be a problem. I looked through part023.fasta.WORKING with a short python script and couldn't find anything:
code
found = False
with open("./part023.fasta.WORKING", "r") as fasta:
for line in fasta:
if line.startswith(">"):
continue
else:
for c in line:
if c not in ["A", "C", "G", "T", "\n"]:
found = True
if found is True:
print(line)
break
This runs through without any output. Manual inspection of (parts of) the FASTA file also did not show anything suspicious.
Steps to reproduce
I uploaded part023.csnpack and part023.fasta.WORKING on Dropbox.
Environment
HPC environment running Oracle Linux 9 (kernel release 5.14.0-362.24.1.0.1.el9_3.x86_64); verkko installation via conda
. The verkko
call was put in a wrapper script and ran on the head node of the cluster in --slurm
mode.
verkko script
#!/usr/bin/env bash
module load conda
conda activate verkko
# long read input - ONT
NANOPORE="/path/to/nanopore.fastq.gz"
# long read input - PacBio CCS
PACBIO="/path/to/ccs.fastq.gz"
# output directory
OUTPUT="/path/to/verkko/output/"
mkdir -p "$OUTPUT" || exit 1
verkko -d $OUTPUT --hifi $PACBIO --nano $NANOPORE --no-correction --local-cpus 1 --slurm --snakeopts "--cores 64"