MobiCT is an analysis pipeline designed for detecting SNVs (Single Nucleotide Variants) and small InDels in circulating tumor DNA (ctDNA) obtained through non-invasive liquid biopsy. The pipeline serves diagnostic, prognostic, and therapeutic purposes in precision oncology.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute environments in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible.
MobiCT performs the following key steps:
- Quality Control: Raw read quality assessment using FastP
- Alignment: Read mapping to reference genome using BWA-MEM
- Deduplication: PCR duplicate removal using Picard/fgbio
- Variant Calling: SNV and InDel detection using VarDict
- Annotation: Variant annotation using Ensembl VEP
- Quality Metrics: Comprehensive QC metrics generation
- Reporting: MultiQC report generation
-
Install
Nextflow
(>=20.04.0
) -
Create a conda environment with required tools:
conda create -n mobict -c conda-forge -c bioconda \ gatk4 fgbio bwa fastp samtools picard vardict ensembl-vep
-
Download the pipeline:
git clone https://github.com/SimCab-CHU/MobiCT.git cd MobiCT
-
Test the pipeline with minimal dataset:
conda activate mobict nextflow run MobiCT.nf -c nextflow.config --input test_data
nextflow -log /path/to/output/my.log run MobiCT.nf -c nextflow.config
Before running the pipeline, edit the nextflow.config
file to specify:
- Input FASTQ files paths
- Output directory
- Reference genome path
- Target intervals/BED files
- Resource allocation
The pipeline expects:
- FASTQ files: Paired-end sequencing data from ctDNA samples
- Reference genome: Human reference genome (e.g., GRCh38)
- Target intervals: BED file defining regions of interest
- VEP database: Pre-downloaded VEP cache and databases
-
Download reference genome (GRCh38 recommended):
wget http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
-
Download VEP databases (see VEP documentation):
vep_install -a cf -s homo_sapiens -y GRCh38 -c /path/to/vep/cache
Results are organized in the specified output directory:
outdir/
├── sample1/
│ ├── sample1.deduplicated.bam
│ ├── sample1.deduplicated.bam.bai
│ ├── sample1.annotated.vcf
│ ├── sample1.HsMetrics.1.txt
│ ├── sample1.HsMetrics.3.txt
│ └── sample1.QC.bcftools_stats.stats
├── sample2/
│ └── ...
└── multiqc/
└── multiqc_report.html
.deduplicated.bam
: Aligned, deduplicated BAM file.annotated.vcf
: Variant calls with functional annotations.HsMetrics.*.txt
: Hybrid selection metrics from Picard.QC.bcftools_stats.stats
: Variant calling statisticsmultiqc_report.html
: Comprehensive quality control report
Parameter | Description | Default |
---|---|---|
--input |
Path to input FASTQ files | Required |
--outdir |
Output directory | ./results |
--genome |
Reference genome path | Required |
--intervals |
Target intervals BED file | Required |
Parameter | Description | Default |
---|---|---|
--max_cpus |
Maximum number of CPUs | 16 |
--max_memory |
Maximum memory allocation | '128.GB' |
--max_time |
Maximum time per job | '240.h' |
The pipeline supports different execution profiles:
conda
: Use Conda for dependency managementdocker
: Use Docker containerssingularity
: Use Singularity containerstest
: Run with test dataset
Example:
nextflow run MobiCT.nf -profile conda,test
Raw sequencing data (FASTQ files) of commercial controls used in the study are available at: NCBI SRA: PRJNA1209006
If you use MobiCT for your analysis, please cite:
MobiCT: ctDNA Analysis Pipeline
[Publication DOI will be added]
This pipeline uses several bioinformatics tools. Please also cite:
- Nextflow: Paolo Di Tommaso, et al. Nextflow enables reproducible computational workflows. Nature Biotechnology 35, 316–319 (2017)
- BWA: Li H. and Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754-1760 (2009)
- VarDict: Zhongwu Lai, et al. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Research 44, e108 (2016)
- VEP: McLaren W, et al. The Ensembl Variant Effect Predictor. Genome Biology 17, 122 (2016)
- MultiQC: Philip Ewels, et al. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047-3048 (2016)
MobiCT was developed by the SimCab team at CHU.
This project is licensed under the MIT License - see the LICENSE file for details.
For questions and support:
- Create an issue on GitHub
- Contact the development team
- Initial release
- Support for SNV and small InDel detection in ctDNA
- Integrated quality control and reporting
- VEP-based variant annotation