Skip to content
Anicet Ebou edited this page Jun 23, 2025 · 11 revisions

xgt logo

  1. What Is xgt?
  2. What Is xgt For?
  3. What Problem Does xgt Solve?
  4. What Design Principles Underlie xgt?
  5. How Does xgt Accomplish Its Goals?

What Is xgt?

In genomic research, access to genomes with reliable, up-to-date taxonomy is essential. xgt is a command-line tool that simplifies access to the Genome Taxonomy Database (GTDB), a comprehensive, high-quality resource of microbial genome taxonomies.

xgt enables fast, reliable, and scriptable access to GTDB's web API while adding flexible querying and built-in parsing capabilities. Whether you're automating workflows or conducting taxonomic comparisons, xgt provides a lightweight and efficient interface.

🎯 What Is xgt for?

xgt was developed to:

  • 🧠 Facilitate programmatic access to GTDB's public API.

  • 🛠️ Add powerful and flexible parsing capabilities on top of raw GTDB responses.

  • ⚡ Accelerate extraction and interpretation of genome metadata and taxonomic classification.

  • 📊 Support comparative analysis of genome classification history and taxonomy revisions.

What Problem Does xgt Solve?

The Genome Taxonomy Database (GTDB) is a foundational resource for microbial systematics, containing over 500,000 curated bacterial and archaeal genomes. Despite its importance, its public API:

  • Does not provide parsed data.

  • Requires manual or custom scripts to extract meaningful information.

  • Can be tedious to use in pipelines or large-scale comparative workflows.

xgt solves these limitations by offering:

  • 🧩 A simple interface for querying GTDB.

  • 🔍 Built-in parsing of metadata, taxonomic history, and classification cards.

  • 🧪 Tools for comparing changes across GTDB releases.

  • ⚙️ Clean integration into larger genomic workflows via CLI or library.

🧱 What Design Principles Underlie xgt?

xgt is built on the following design principles:

  • Simplicity: Clear CLI syntax and consistent output formats (JSON, CSV, TSV).

  • Performance: Fast HTTP client and streaming-friendly response handling.

  • Modularity: Structured commands (search, genome, taxon) allow granular control.

  • Extensibility: New features or formats (e.g. yaml, gzip) can be added with minimal effort.

  • Robustness: Graceful handling of API errors and malformed responses.

  • Transparency: Output is designed to be human-readable and pipeline-friendly.

🚀 How Does xgt Accomplish Its Goal?

xgt interacts with GTDB’s API endpoints such as:

  • /search: to search for genome or taxon names.

  • /genome/card: to retrieve comprehensive genome cards.

  • /genome/metadata: to fetch metadata for one or multiple accessions.

  • /genome/taxon-history: to trace taxonomic reclassifications across GTDB versions.

It provides:

  • Structured output (JSON/TSV) ready for further parsing or machine learning workflows.

  • Comparison features to trace taxonomic evolution.

  • Integrated error handling and validation to ensure data integrity.

  • Support for batch queries, redirection of results to files, and flexible filtering.

Clone this wiki locally