A collection of genomics software tools written in Rust
- alignoth : Creating alignment plots from bam files
- bamrescue : Utility to check Binary Sequence Alignment / Map (BAM) files for corruption and repair them
- best : Bam Error Stats Tool (best): analysis of error types in aligned reads
- modkit : A bioinformatics tool for working with modified bases
- mapAD : An aDNA aware short-read mapper
- perbase : Per-base per-nucleotide depth analysis
- rustybam : bioinformatics toolkit in rust
- csview : π Pretty and fast csv viewer for cli with cjk/emoji support
- csvlens : csvlens is a command line CSV file viewer. It is like less but made for CSV.
- madato : Markdown Cmd Line, Rust and JS library for Excel to Markdown Tables
- qsv : Blazing-fast Data-Wrangling toolkit
- rsv : A command-line tool written in Rust for analyzing CSV, TXT, and Excel files.
- tabiew : A lightweight TUI app to view and query CSV files
- tv : πΊ(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.
- xan : The CSV magician
- xsv : A fast CSV command line toolkit written in Rust. Β
- xtab : CSV command line utilities
- biotools : Command line bioinformatics functions
- darwin : Create (rapid) neighbor-joining tree from sequences using mash distance
- fakit : fakit: a simple program for fasta file manipulation
- filterx : process any file in tabular format. Fasta/fastq/GTF/GFF/VCF/SAM/BED
- fq : Command line utility for manipulating Illumina-generated FASTQ files.
- gsearch : Approximate nearest neighbour search for microbial genomes based on hash metric
- Hyper-Gen : HyGen: Compact and Efficient Genome Sketching using Hyperdimensional Vectors
- kanpig : Kmer Analysis of Pileups for Genotyping
- kfc : KFC (K-mer Fast Counter) is a fast and space-efficient k-mer counter based on hyper-k-mers.
- ngs : Command line utility for working with next-generation sequencing files.
- nail : Nail is an Alignment Inference tooL
- palindrome-finder : A bioinformatics tool written in Rust to find palindromic sequences in DNA
- poasta : Fast and exact gap-affine partial order alignment
- psdm : Compute a pairwise SNP distance matrix from one or two alignment(s)
- rust-bio-tools : A set of command line utilities based on Rust-Bio.
- sigalign : A Similarity-Guided Alignment Algorithm
- ska : Split k-mer analysis β version 2
- skc : Shared k-mer content between two genomes
- sketchy : Genomic neighbor typing of bacterial pathogens using MinHash π
- tidk : Identify and find telomeres, or telomeric repeats in a genome.
- transanno : accurate LiftOver tool for new genome assemblies
- xgt : Efficient and fast querying and parsing of GTDB's data
- deacon : Fast (host) DNA sequence filtering
- fasten : π· Fasten toolkit, for streaming operations on fastq files
- faster : A (very) fast program for getting statistics about a fastq file, the way I need them, written in Rust
- fqgrep : Grep for FASTQ files
- fqkit : π¦ Fqkit: A simple and cross-platform program for fastq file manipulation Β
- fqtk : Fast FASTQ sample demultiplexing in Rust.
- grepq: quickly filter fastq files by matching sequences to a set of regex patterns
- guide-counter : A better, faster way to count guides in CRISPR screens.
- K2Rmini : K2Rmini (or K-mer to Reads mini) is a tool to filter the reads contained in a FASTA/Q file based on a set of k-mers of interest.
- kractor : Rapidly extract reads from a FASTQ file based on taxonomic classification via Kraken2.
- rasusa : Randomly subsample sequencing reads
- SeqSizzle : SeqSizzle is a pager for viewing FASTQ files with fuzzy matching, allowing different adaptors to be colored differently.
- sabreur : fast, reliable and handy demultiplexing tool for fastx files
- atlas : Enables storing, querying, transforming, and visualizing of multidimensional count data.
- bigtools : A high-performance BigWig and BigBed library in Rust
- biotest : Generate random test data for bioinformatics
- bqtools : A command line utilty for working with BINSEQ files
- cigzip : A tool for compression and decompression of alignment CIGAR strings using tracepoints.
- d4tools : The D4 Quantitative Data Format
- gfa2bin : Convert various graph-related data to PLINK file. In addition, we offer multiple commands for filtering or modifying the generated PLINK files.
- gia : gia: Genomic Interval Arithmetic
- granges : A Rust library and command line tool for working with genomic ranges and their data.
- intspan : Command line tools for IntSpan related bioinformatics operations
- nuc2bit : A rust crate that provides methods for rapidly encoding and decoding nucleotides in 2-bit representation.
- recmap : A command line tool and Rust library for working with recombination maps.
- transanno : accurate LiftOver tool for new genome assemblies
- thirdkind : Drawing reconciled phylogenetic trees allowing 1, 2 or 3 reconcillation levels
- xsra : An efficient CLI to extract sequences from the SRA
- atg : A Rust library and CLI tool to handle genomic transcripts
- gffkit : a simple program for gff3 file manipulation
- Autocycler : A tool for generating consensus long-read assemblies for bacterial genomes
- chopper : Rust implementation of NanoFilt+NanoLyse, both originally written in Python. This tool, intended for long read sequencing such as PacBio or ONT, filters and trims a fastq file.
- DeepChopper : Language models identify chimeric artificial reads in NanoPore direct-RNA sequencing data.
- fpa : Filter of Pairwise Alignement
- herro : HERRO is a highly-accurate, haplotype-aware, deep-learning tool for error correction of Nanopore R10.4.1 or R9.4.1 reads (read length of >= 10 kbps is recommended).
- HiPhase : Small variant, structural variant, and short tandem repeat phasing tool for PacBio HiFi reads
- isONclust3 : De novo clustering of long transcript reads into genes
- longshot : diploid SNV caller for error-prone reads
- lrge : Genome size estimation from long read overlaps
- myloasm : A new high-resolution long-read metagenome assembler for even noisy reads
- Polypolish : a short-read polishing tool for long-read assemblies
- nextpolish2 : Repeat-aware polishing genomes assembled using HiFi long reads
- nanoq : Minimal but speedy quality control for nanopore reads in Rust π»
- smrest : Tumour-only somatic mutation calling using long reads
- trgt : Tandem repeat genotyping and visualization from PacBio HiFi data
- yacrd : Yet Another Chimeric Read Detector
- coverm : Read coverage calculator for metagenomics
- galah : More scalable dereplication for metagenome assembled genomes
- hyperex : Hypervariable region primer-based extractor for 16S rRNA and other SSU/LSU sequences.
- kun_peng : Kun-peng: an ultra-fast, low-memory footprint and accurate taxonomy classifier for all
- kmertools : kmer based feature extraction tool for bioinformatics, metagenomics, AI/ML and more
- kmerutils : Kmer generating, counting hashing and related
- Lorikeet : Strain resolver for metagenomics
- nohuman : Remove human reads from a sequencing run
- rosella : Metagenomic Binning Algorithm
- skani : Fast, robust ANI and aligned fraction for (metagenomic) genomes and contigs.
- sourmash : Quickly search, compare, and analyze genomic and metagenomic data sets.
- sylph : ultrafast genome querying and taxonomic profiling for metagenomic samples by abundance-corrected minhash.
- vircov : Viral genome coverage evaluation for metagenomic diagnostics π©Έ
- impg : implicit pangenome graph
- panacus : Panacus is a tool for computing statistics for GFA-formatted pangenome graphs
- nextclade : Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement
- nwr : nwr is a command line tool for working with NCBI taxonomy, Newick files and assembly reports
- unicore : Universal and efficient core gene phylogeny with Foldseek and ProstT5
- segul : An ultrafast and memory efficient tool for phylogenomics
- align-cli : A CLI for pairwise alignment of sequences, using both normal and mass based alignment.
- daedalus : Protein and molecule viewer
- folddisco : Fast indexing and search of discontinuous motifs in protein structures
- foldmason : Foldmason builds multiple alignments of large structure sets.
- sage : Proteomics search & quantification so fast that it feels like magic
- oarfish : long read RNA-seq quantification
- rnapkin : drawing RNA secondary structure with style; instantly
- R2Dtool : R2Dtool is a set of genomics utilities for handling, integrating, and viualising isoform-mapped RNA feature data.
- squab : Alignment-based gene expression quantification
- adview : Adata Viewer: Head/Less/Shape h5ad file in terminal
- alevin-fry : π π¬π¦ alevin-fry is an efficient and flexible tool for processing single-cell sequencing data, currently focused on single-cell transcriptomics and feature barcoding.
- cellranger : 10x Genomics Single Cell Analysis
- precellar : Single-cell genomics preprocessing package
- proseg : Probabilistic cell segmentation for in situ spatial transcriptomics
- SnapATAC2 : Single-cell epigenomics analysis tools
- ssubmit : Submit slurm sbatch jobs without the need to create a script
- wdl : Rust crates for working with Workflow Description Language (WDL) documents.
- echtvar : using all the bits for echt rapid variant annotation and filtering
- gvcf_norm : gVCF allele normalizer
- mehari: VEP-like tool for sequence ontology and HGVS annotation of VCF files
- vcf2parquet : Convert vcf in parquet
- vcfexpress : expressions on VCFs
- vcf-reformatter : 𧬠High-performance VCF file parser and reformatter with VEP annotation support. Converts complex VCF files to analyzable TSV format with intelligent transcript handling.
- plascad : Design software for plasmid (vector) and primer creation and validation. Edit plasmids, perform PCR-based cloning, digest and ligate DNA fragments, and display details about expressed proteins. Integrates with online resources like NCBI and PDB.
- biobear : Work with bioinformatic files using Arrow, Polars, and/or DuckDB
- binseq : A high efficiency binary format for sequencing data
- exon : Exon is an OLAP query engine specifically for biology and life science applications.
- ggetrs : Efficient querying of biological databases
- htsget-rs : A server implementation of the htsget protocol for bioinformatics in Rust
- ibu : a rust library for high throughput binary encoding of genomic sequences
- scidataflow: Command line scientific data management tool
- sufr : Parallel Construction of Suffix Arrays in Rust