-
Notifications
You must be signed in to change notification settings - Fork 0
Home
- What Is xgt?
- What Is xgt For?
- What Problem Does xgt Solve?
- What Design Principles Underlie xgt?
- How Does xgt Accomplish Its Goals?
In genomic research, access to genomes with reliable, up-to-date taxonomy is essential. xgt
is a command-line tool that simplifies access to the Genome Taxonomy Database (GTDB), a comprehensive, high-quality resource of microbial genome taxonomies.
xgt
enables fast, reliable, and scriptable access to GTDB's web API while adding flexible querying and built-in parsing capabilities. Whether you're automating workflows or conducting taxonomic comparisons, xgt
provides a lightweight and efficient interface.
xgt
was developed to:
-
🧠 Facilitate programmatic access to GTDB's public API.
-
🛠️ Add powerful and flexible parsing capabilities on top of raw GTDB responses.
-
⚡ Accelerate extraction and interpretation of genome metadata and taxonomic classification.
-
📊 Support comparative analysis of genome classification history and taxonomy revisions.
The Genome Taxonomy Database (GTDB) is a foundational resource for microbial systematics, containing over 500,000 curated bacterial and archaeal genomes. Despite its importance, its public API:
-
Does not provide parsed data.
-
Requires manual or custom scripts to extract meaningful information.
-
Can be tedious to use in pipelines or large-scale comparative workflows.
xgt
solves these limitations by offering:
-
🧩 A simple interface for querying GTDB.
-
🔍 Built-in parsing of metadata, taxonomic history, and classification cards.
-
🧪 Tools for comparing changes across GTDB releases.
-
⚙️ Clean integration into larger genomic workflows via CLI or library.
xgt
is built on the following design principles:
-
Simplicity: Clear CLI syntax and consistent output formats (JSON, CSV, TSV).
-
Performance: Fast HTTP client and streaming-friendly response handling.
-
Modularity: Structured commands (search, genome, taxon) allow granular control.
-
Extensibility: New features or formats (e.g. yaml, gzip) can be added with minimal effort.
-
Robustness: Graceful handling of API errors and malformed responses.
-
Transparency: Output is designed to be human-readable and pipeline-friendly.
xgt
interacts with GTDB’s API endpoints such as:
-
/search
: to search for genome or taxon names. -
/genome/card
: to retrieve comprehensive genome cards. -
/genome/metadata
: to fetch metadata for one or multiple accessions. -
/genome/taxon-history
: to trace taxonomic reclassifications across GTDB versions.
It provides:
-
Structured output (JSON/TSV) ready for further parsing or machine learning workflows.
-
Comparison features to trace taxonomic evolution.
-
Integrated error handling and validation to ensure data integrity.
-
Support for batch queries, redirection of results to files, and flexible filtering.