Skip to content

rwk-unil/shapeit4

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SHAPEIT4 fork with xSqueezeIt file format support

  • This fork was made to add xSqueezeIt support to SHAPEIT4
  • Explore what API would xSqueezeIt need to provide to be integrated
  • Show a real world example of xSqueezeIt usage
# Compile with :
make XSI_SUPPORT=YES
# Run with the variant bcf file of xSqueezeIt (and not the binary file)
./bin/shapeit4.2 --input test/unphased.xsi_var.bcf --map test/chr20.b37.gmap.gz --region 20 --output xsi_phased2.bcf

It also supports xSqueezeIt for the reference / scaffolding. However, it does not support phase sets because xSqueezeIt does not support the PS field for the moment.

xSqueezeIt

BCF Genotype Data Compressor

Github link : https://github.com/rwk-unil/xSqueezeIt

xSqueezeIt compresses the GT data of a BCF file in a binary file (for example .xsi or .bin) but keeps the variant information in BCF, so two output files are generated. The variant BCF file has a _var.bcf suffix to the chosen binary output filename (see example below). The bcf file is to be passed as an argument to SHAPEIT and not the binary file.

Compress bcf files

# Compress the test file with xSqueezeIt (see build instructions in git link above)
xsqueezeit -c -f test/unphased.bcf -o test/unphased.xsi --zstd
# This will generate two files : unphased.xsi and unphased.xsi_var.bcf
# Both files are required, pass the bcf as input to SHAPEIT, it also needs to be indexed :
bcftools index test/unphased.xsi_var.bcf
...
xsqueezeit -c -f test/reference.bcf -o test/reference.xsi --zstd
# This will generate two files : reference.xsi and reference.xsi_var.bcf
# Both files are required, pass the bcf as input to SHAPEIT, it also needs to be indexed :
bcftools index test/reference.xsi_var.bcf
...

Segmented HAPlotype Estimation and Imputation Tools version 4 (SHAPEIT4)

SHAPEIT4 is a fast and accurate method for estimation of haplotypes (aka phasing) for SNP array and sequencing data. The version 4 is a refactored and improved version of the SHAPEIT algorithm with multiple key additional features:

  • It includes a Positional Burrow Wheeler Transform (PBWT) based approach to quickly select a small set of informative conditioning haplotypes to be used when updating the phase of an individual.
  • We have changed that way in which phase information in sequencing reads is input into the model. We now recommend the use of the WhatsHap tool as a pre-processing step to extract phase information from a bam file..
  • It accounts for sets of pre-phased genotypes (i.e. haplotype scaffold). The scaffold can be derived either from family data or large reference panels.
  • It reads and writes files using HTSlib for better I/O performance in either VCF or BCF formats.
  • The genotype graph and HMM routines have been re-implemented for better hardware usage and performance.
  • The source code is provided in an open source format (license MIT) on github.

If you use the SHAPEIT4 in your research work, please cite the following paper:

Delaneau O., et al. Accurate, scalable and integrative haplotype estimation. Nature Communications volume 10, Article number: 5436 (2019). https://www.nature.com/articles/s41467-019-13225-y

Documentation

https://odelaneau.github.io/shapeit4/

License

This project is licensed under the MIT License - see the LICENSE file for details

About

Segmented HAPlotype Estimation and Imputation Tool

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 99.1%
  • Makefile 0.9%