GitHub - sharkLoc/rust-in-bioinformatics: A collection of genomics software tools written in Rust

rust in bioinformatics

A collection of genomics software tools written in Rust

index section

bam

alignoth : Creating alignment plots from bam files
bamrescue : Utility to check Binary Sequence Alignment / Map (BAM) files for corruption and repair them
best : Bam Error Stats Tool (best): analysis of error types in aligned reads
modkit : A bioinformatics tool for working with modified bases
mapAD : An aDNA aware short-read mapper
perbase : Per-base per-nucleotide depth analysis
rustybam : bioinformatics toolkit in rust

csv

csview : 📠 Pretty and fast csv viewer for cli with cjk/emoji support
csvlens : csvlens is a command line CSV file viewer. It is like less but made for CSV.
madato : Markdown Cmd Line, Rust and JS library for Excel to Markdown Tables
qsv : Blazing-fast Data-Wrangling toolkit
rsv : A command-line tool written in Rust for analyzing CSV, TXT, and Excel files.
tabiew : A lightweight TUI app to view and query CSV files
tv : 📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.
xan : The CSV magician
xsv : A fast CSV command line toolkit written in Rust.
xtab : CSV command line utilities

dna

biotools : Command line bioinformatics functions
darwin : Create (rapid) neighbor-joining tree from sequences using mash distance
fakit : fakit: a simple program for fasta file manipulation
filterx : process any file in tabular format. Fasta/fastq/GTF/GFF/VCF/SAM/BED
fq : Command line utility for manipulating Illumina-generated FASTQ files.
gsearch : Approximate nearest neighbour search for microbial genomes based on hash metric
Hyper-Gen : HyGen: Compact and Efficient Genome Sketching using Hyperdimensional Vectors
kanpig : Kmer Analysis of Pileups for Genotyping
kfc : KFC (K-mer Fast Counter) is a fast and space-efficient k-mer counter based on hyper-k-mers.
ngs : Command line utility for working with next-generation sequencing files.
nail : Nail is an Alignment Inference tooL
palindrome-finder : A bioinformatics tool written in Rust to find palindromic sequences in DNA
poasta : Fast and exact gap-affine partial order alignment
psdm : Compute a pairwise SNP distance matrix from one or two alignment(s)
rust-bio-tools : A set of command line utilities based on Rust-Bio.
sigalign : A Similarity-Guided Alignment Algorithm
ska : Split k-mer analysis – version 2
skc : Shared k-mer content between two genomes
sketchy : Genomic neighbor typing of bacterial pathogens using MinHash 🐀
tidk : Identify and find telomeres, or telomeric repeats in a genome.
transanno : accurate LiftOver tool for new genome assemblies
xgt : Efficient and fast querying and parsing of GTDB's data

fastq

deacon : Fast (host) DNA sequence filtering
fasten : 👷 Fasten toolkit, for streaming operations on fastq files
faster : A (very) fast program for getting statistics about a fastq file, the way I need them, written in Rust
fqgrep : Grep for FASTQ files
fqkit : 🦀 Fqkit: A simple and cross-platform program for fastq file manipulation
fqtk : Fast FASTQ sample demultiplexing in Rust.
grepq: quickly filter fastq files by matching sequences to a set of regex patterns
guide-counter : A better, faster way to count guides in CRISPR screens.
K2Rmini : K2Rmini (or K-mer to Reads mini) is a tool to filter the reads contained in a FASTA/Q file based on a set of k-mers of interest.
kractor : Rapidly extract reads from a FASTQ file based on taxonomic classification via Kraken2.
rasusa : Randomly subsample sequencing reads
SeqSizzle : SeqSizzle is a pager for viewing FASTQ files with fuzzy matching, allowing different adaptors to be colored differently.
sabreur : fast, reliable and handy demultiplexing tool for fastx files

format

atlas : Enables storing, querying, transforming, and visualizing of multidimensional count data.
bigtools : A high-performance BigWig and BigBed library in Rust
biotest : Generate random test data for bioinformatics
bqtools : A command line utilty for working with BINSEQ files
cigzip : A tool for compression and decompression of alignment CIGAR strings using tracepoints.
d4tools : The D4 Quantitative Data Format
gfa2bin : Convert various graph-related data to PLINK file. In addition, we offer multiple commands for filtering or modifying the generated PLINK files.
gia : gia: Genomic Interval Arithmetic
granges : A Rust library and command line tool for working with genomic ranges and their data.
intspan : Command line tools for IntSpan related bioinformatics operations
nuc2bit : A rust crate that provides methods for rapidly encoding and decoding nucleotides in 2-bit representation.
recmap : A command line tool and Rust library for working with recombination maps.
transanno : accurate LiftOver tool for new genome assemblies
thirdkind : Drawing reconciled phylogenetic trees allowing 1, 2 or 3 reconcillation levels
xsra : An efficient CLI to extract sequences from the SRA

gff3

atg : A Rust library and CLI tool to handle genomic transcripts
gffkit : a simple program for gff3 file manipulation

longreads

Autocycler : A tool for generating consensus long-read assemblies for bacterial genomes
chopper : Rust implementation of NanoFilt+NanoLyse, both originally written in Python. This tool, intended for long read sequencing such as PacBio or ONT, filters and trims a fastq file.
DeepChopper : Language models identify chimeric artificial reads in NanoPore direct-RNA sequencing data.
fpa : Filter of Pairwise Alignement
herro : HERRO is a highly-accurate, haplotype-aware, deep-learning tool for error correction of Nanopore R10.4.1 or R9.4.1 reads (read length of >= 10 kbps is recommended).
HiPhase : Small variant, structural variant, and short tandem repeat phasing tool for PacBio HiFi reads
isONclust3 : De novo clustering of long transcript reads into genes
longshot : diploid SNV caller for error-prone reads
lrge : Genome size estimation from long read overlaps
myloasm : A new high-resolution long-read metagenome assembler for even noisy reads
Polypolish : a short-read polishing tool for long-read assemblies
nextpolish2 : Repeat-aware polishing genomes assembled using HiFi long reads
nanoq : Minimal but speedy quality control for nanopore reads in Rust 🐻
smrest : Tumour-only somatic mutation calling using long reads
trgt : Tandem repeat genotyping and visualization from PacBio HiFi data
yacrd : Yet Another Chimeric Read Detector

metagenomics

coverm : Read coverage calculator for metagenomics
galah : More scalable dereplication for metagenome assembled genomes
hyperex : Hypervariable region primer-based extractor for 16S rRNA and other SSU/LSU sequences.
kun_peng : Kun-peng: an ultra-fast, low-memory footprint and accurate taxonomy classifier for all
kmertools : kmer based feature extraction tool for bioinformatics, metagenomics, AI/ML and more
kmerutils : Kmer generating, counting hashing and related
Lorikeet : Strain resolver for metagenomics
nohuman : Remove human reads from a sequencing run
rosella : Metagenomic Binning Algorithm
skani : Fast, robust ANI and aligned fraction for (metagenomic) genomes and contigs.
sourmash : Quickly search, compare, and analyze genomic and metagenomic data sets.
sylph : ultrafast genome querying and taxonomic profiling for metagenomic samples by abundance-corrected minhash.
vircov : Viral genome coverage evaluation for metagenomic diagnostics 🩸

pangenomics

impg : implicit pangenome graph
panacus : Panacus is a tool for computing statistics for GFA-formatted pangenome graphs

phylogenomics

nextclade : Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement
nwr : nwr is a command line tool for working with NCBI taxonomy, Newick files and assembly reports
unicore : Universal and efficient core gene phylogeny with Foldseek and ProstT5
segul : An ultrafast and memory efficient tool for phylogenomics

proteomics

align-cli : A CLI for pairwise alignment of sequences, using both normal and mass based alignment.
daedalus : Protein and molecule viewer
folddisco : Fast indexing and search of discontinuous motifs in protein structures
foldmason : Foldmason builds multiple alignments of large structure sets.
sage : Proteomics search & quantification so fast that it feels like magic

rna

oarfish : long read RNA-seq quantification
rnapkin : drawing RNA secondary structure with style; instantly
R2Dtool : R2Dtool is a set of genomics utilities for handling, integrating, and viualising isoform-mapped RNA feature data.
squab : Alignment-based gene expression quantification

singlecell

adview : Adata Viewer: Head/Less/Shape h5ad file in terminal
alevin-fry : 🐟 🔬🦀 alevin-fry is an efficient and flexible tool for processing single-cell sequencing data, currently focused on single-cell transcriptomics and feature barcoding.
cellranger : 10x Genomics Single Cell Analysis
precellar : Single-cell genomics preprocessing package
proseg : Probabilistic cell segmentation for in situ spatial transcriptomics
SnapATAC2 : Single-cell epigenomics analysis tools

slurm

ssubmit : Submit slurm sbatch jobs without the need to create a script
wdl : Rust crates for working with Workflow Description Language (WDL) documents.

vcf

echtvar : using all the bits for echt rapid variant annotation and filtering
gvcf_norm : gVCF allele normalizer
mehari: VEP-like tool for sequence ontology and HGVS annotation of VCF files
vcf2parquet : Convert vcf in parquet
vcfexpress : expressions on VCFs
vcf-reformatter : 🧬 High-performance VCF file parser and reformatter with VEP annotation support. Converts complex VCF files to analyzable TSV format with intelligent transcript handling.

Gui

plascad : Design software for plasmid (vector) and primer creation and validation. Edit plasmids, perform PCR-based cloning, digest and ligate DNA fragments, and display details about expressed proteins. Integrates with online resources like NCBI and PDB.

other

biobear : Work with bioinformatic files using Arrow, Polars, and/or DuckDB
binseq : A high efficiency binary format for sequencing data
exon : Exon is an OLAP query engine specifically for biology and life science applications.
ggetrs : Efficient querying of biological databases
htsget-rs : A server implementation of the htsget protocol for bioinformatics in Rust
ibu : a rust library for high throughput binary encoding of genomic sequences
scidataflow: Command line scientific data management tool
sufr : Parallel Construction of Suffix Arrays in Rust

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

rust in bioinformatics

index section

bam

csv

dna

fastq

format

gff3

longreads

metagenomics

pangenomics

phylogenomics

proteomics

rna

singlecell

slurm

vcf

Gui

other

Starchart

About

Uh oh!

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
bam		bam
csv		csv
dna		dna
fastq		fastq
longreads		longreads
metagenomics		metagenomics
pangenomics		pangenomics
phylogenomics		phylogenomics
proteomics		proteomics
rna		rna
singlecell		singlecell
slurm		slurm
vcf		vcf
LICENSE		LICENSE
README.md		README.md

License

sharkLoc/rust-in-bioinformatics

Folders and files

Latest commit

History

Repository files navigation

rust in bioinformatics

index section

bam

csv

dna

fastq

format

gff3

longreads

metagenomics

pangenomics

phylogenomics

proteomics

rna

singlecell

slurm

vcf

Gui

other

Starchart

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages