Skip to content

vcfanno gzip IO related errors (race condition with multiple threads to bgzf.Reader) #64

@chapmanb

Description

@chapmanb

Brent;
We've incorporated vcfanno into bcbio with a ton of success. It's been awesome to have general flexibility for annotation. Now that we're starting to test at scale we've been seeing intermittent issues with reading VCF files. These appear to be IO related issues from the error messages and aren't reproducible -- the files themselves are fine and just re-running the same command works.

I've been trying to collect error cases and the issue is reported after:

vcfanno version 0.2.4 [built with go1.8]

vcfanno.go:115: found 1 sources from 1 files
vcfanno.go:143: using 2 worker threads to decompress query file
api.go:670: vcfanno: using ~2 workers per file

We then see errors and a failed command with these errors:

parallel.go:151: gzip: invalid header

or

parallel.go:151: short buffer

I know this is not a great report but I don't have much more to go on from my side. Do you know if there are ways we could make vcfanno more resilient to IO/read issues? Thanks for any pointers or ideas to tackle.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions