Skip to content

Ebedthan/sabreur

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sabreur

CI Coverage Rust version License

sabreur logo

🔎 About

sabreur is a command-line tool designed to demultiplex barcoded sequencing reads into separate files. It supports:

  • FASTA and FASTQ formats
  • Compressed inputs and outputs: gzip, bzip2, xz, and zstd
  • Paired-end and Single-end reads

It uses a barcode file to match reads and dispatches each to the corresponding output. Reads with unknown barcodes go into a separate file.

Powered by niffler for seamless compression support.

🚀 Usage

▶️ Paired-end mode

sabreur barcode.txt input_R1.fq.gz input_R2.fq.gz

▶️ Single-end mode

sabreur barcode.txt input.fq

sabreur automatically detects the format and compression. Just provide the inputs!

⚙️ Command-Line Options

USAGE:
    sabreur [options] <BARCODE> <FORWARD FILE> [<REVERSE FILE>]

ARGS:
    <BARCODE>    input barcode file
    <FORWARD>    input forward fastx file
    <REVERSE>    input reverse fastx file

OPTIONS:
    -m, --mismatch <INT>    maximum number of mismatches [default: 0]
    -o, --out <DIR>         ouput directory [default: sabreur_out]
    -f, --format <STR>      output files compression format
    -l, --level <INT>       compression level [default: 1]
        --force             force reuse of output directory
    -q, --quiet             decrease program verbosity
    -h, --help              Print help information
    -V, --version           Print version information

📦 Installation

Requirements

  • Rust in stable channel
  • libgz for gz file support
  • liblzma for xz file support
  • libbzip2 for bzip2 file support
  • zstd for zstd file support

🛠️ From Source (via Cargo)

git clone https://github.com/Ebedthan/sabreur.git
cd sabreur
cargo install --path . --root ~/.cargo
sabreur --help

📁 Prebuilt Binaries

Download binaries for your platform from the releases page:

Benchmark

Benchmarked with hyperfine dataset.

Tool Single-end uncompressed output Single-end compressed output Paired-end uncompressed output Paired-end compressed output
idemp - 211.571 ± 3.718 - 366.247 ± 10.482
sabre 32.911 ± 2.411 - 109.470 ± 49.909 -
sabreur 10.843 ± 0.531 93.840 ± 0.446 40.878 ± 13.743 187.533 ± 0.572

🗜️ Compression format performance

A simple benchmark of the different compression format (sabreur tests/bc_pe_fq.txt tests/input_R1.fastq.gz tests/input_R2.fastq.gz), zst being the fastest.

Command Mean [s] Min [s] Max [s] Relative
--format zst 43.096 ± 1.547 41.179 46.878 1.00
--format bz2 94.049 ± 4.762 87.984 101.140 2.18 ± 0.14
--format gz 123.107 ± 1.748 120.529 125.166 2.86 ± 0.11
--format xz 285.692 ± 18.625 264.960 325.750 6.63 ± 0.49

📄 Barcode File Format

The barcode file must be tab-delimited in the format:

barcode1    barcode1_file1.fq   barcode1_file2.fq
barcode2    barcode2_file1.fq   barcode2_file2.fq
...

Output filenames must be unique. You can use .fq, .fastq, .fa, or .fasta as extensions.

Minimum supported Rust version

sabreur minimum Rust version is 1.78.0.

🤝 Contributing

🐛 Bugs & Support

Found a bug or have a feature request? → Open an issue.

📜 License

This project is licensed under the MIT License.

About

fast, reliable and handy demultiplexing tool for fastx files

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages