-
Notifications
You must be signed in to change notification settings - Fork 56
Description
Hi,
I need a help to annotate overlapping records per-allele basis when AC/AF in the source VCF is
not available and multiple records in the source VCF could be matched to a single record of
query VCF.
I want to annotate my VCF file with dbSNP information from NCBI (All_*.vcf.gz) that
contains RS and dbSNPBuildID fields. Both fields are of Number=1 and Type=Integer.
It is possible that multiple records in All_*.vcf.gz could be matched to a single VCF record and
they are currently annotated as
dbsnp_id=375757231,60722469;dbsnp_build=138,129
even though the corresponding destination header says Number=1,Type=Integer inherited from
the source file.
Firstly, I think it is safer to use a separator different from a comma ',' for multiple hits to
reserve the comma for multiple alleles/genotypes (Number=A,R,G).
Secondly, I want to set the destination annotation as Number=A,Type=String so that
we can know which allele matches the dbSNP record: In case where we have 2 ALT alleles,
expected resulting annotation would be
dbsnp_build=.,138|129
when only the second allele matches dbSNP record. Is it possible to do in the
framework of vcfanno? At minimum, can we specify destination type as Type=String in a
similar way as typecasting (between float and int) explained in README?
Thanks
Katsuhito