Skip to content

Separator of multiple hits and allele specific flag #68

@kyasuno

Description

@kyasuno

Hi,

I need a help to annotate overlapping records per-allele basis when AC/AF in the source VCF is
not available and multiple records in the source VCF could be matched to a single record of
query VCF.

I want to annotate my VCF file with dbSNP information from NCBI (All_*.vcf.gz) that
contains RS and dbSNPBuildID fields. Both fields are of Number=1 and Type=Integer.

It is possible that multiple records in All_*.vcf.gz could be matched to a single VCF record and
they are currently annotated as

dbsnp_id=375757231,60722469;dbsnp_build=138,129

even though the corresponding destination header says Number=1,Type=Integer inherited from
the source file.

Firstly, I think it is safer to use a separator different from a comma ',' for multiple hits to
reserve the comma for multiple alleles/genotypes (Number=A,R,G).

Secondly, I want to set the destination annotation as Number=A,Type=String so that
we can know which allele matches the dbSNP record: In case where we have 2 ALT alleles,
expected resulting annotation would be
dbsnp_build=.,138|129
when only the second allele matches dbSNP record. Is it possible to do in the
framework of vcfanno? At minimum, can we specify destination type as Type=String in a
similar way as typecasting (between float and int) explained in README?

Thanks
Katsuhito

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions