Skip to content

Human reads are actually outputted by nohuman #2

@cpauvert

Description

@cpauvert

Hi @mbhall88,
thanks for developing this tool, this approach is indeed very fast!
However when trying out on bacterial genome assembly data (Nanopore and Illumina) I was puzzled that nohuman was throwing the baby out with the bathwater. Therefore I investigated with the manual approach described in https://github.com/mbhall88/classification_benchmark with kraken2 and got:

277081 sequences (1116.92 Mbp) processed in 16.112s (1031.8 Kseq/m, 4159.33 Mbp/m).
  2814 sequences classified (1.02%).
  274267 sequences unclassified (98.98%)   

But the output file from nohuman contained 2814 sequences, which I trace the typo to be:

kraken_cmd.extend(&["--classified-out", &outfile]);

I submitted a PR to fix that unclassified reads are wanted, but I'm not familiar with Rust, so I could not recompile/test properly, let me know if I can try out something.

Best,
Charlie

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions