-
Notifications
You must be signed in to change notification settings - Fork 20
BEDPE
Keiran Raine edited this page Aug 13, 2018
·
4 revisions
BEDPE file format:
Heading | Type | Description |
---|---|---|
chr1 | String | Chromosome of lower coordinate |
start1 | 0-based int | Start coordinate of lower coordinate |
end1 | 1-based int | End coordinate of lower coordinate |
chr2 | String | Chromosome of high coordinate |
start2 | 0-based int | Start coordinate of high coordinate |
end2 | 1-based int | End coordinate of high coordinate |
id/name | String | ID of event, correlates with VCF |
brass_score | int | Number of aberrant pairs contributing to the rearrangement group. |
strand1 | [+-] | Strand of end in 'genomic' context - see table |
strand2 | [+-] | Strand of end in 'genomic' context - see table |
sample | String | Name of sample as found in BAM RG header SM field, each sample contibuting will be listed. |
svclass | String | Basic event type: deletion, inversion, tandem-duplication, translocation |
bkdist | int | Distance between inner edges of breakpoints (-1 if difference chromosomes) |
assembly_score | int | 0-100, A "niceness" score for the Velvet assembly graph. A score of 100 indicates a perfect graph with five vertices forming a quintet. Points are deducted for isolated vertices, cruft hanging off the quintet, and major points for extra cycles and large-scale graph cruftiness. |
readpair names | String | CSV of read pair names found in aberrant pair grouping (reads with mapping quality MPQ >= 6 ) |
readpair count | int | Count of read pair found in aberrant pair grouping (analogous to brass_score) |
bal_trans | String | ID of event that describes reciprocal balanced translocation event |
inv | String | ID of event that describes reciprocal inversion event |
occL | int | Count of events that share lower coordinate (chr/start/end1) within 500 bp window |
occH | int | Count of events that share higher coordinate (chr/start/end2) within 500 bp window |
copynumber_flag | char | Indicate presence of copynumber change point from ASCAT NGS result |
range_blat | int | flag indicating the degree of homology between the two sides of the breakpoint. For small distances this will always be high, since a region is being compared against itself. Range_blat filtering should therefore not be used in isolation. If needed range_blat can be required to be less than 100, as long as distance is greater than 1000 |
Brass Notation | String | A string containing the brass description of the break point - see table at end |
non-template | String | Non templated sequence |
micro-homology | String | |
assembled readnames | String | CSV of all individual reads that formed part of the ALT assembly path. |
gene | Gene affected by disruption of the coding direction | |
gene_id | ID of gene affected by disruption of the coding direction | |
transcript_id | transcript ID affected by disruption of the coding direction | |
strand | Coding strand of affected feature | |
phase | Phase of breakpoint in affected feature | |
region | ||
region_number | ||
total_region_count | ||
first/last | First or last element of exon structure. | |
fusion_flag | See BEDPE - Fusion-flag. |
SV Type | Strand (1/2) |
---|---|
Inversion | +/- |
Inversion | -/+ |
Deletion | +/+ |
Tandem Dup. | -/- |
Paired-flag | Lower chr | Lower breakpoint | Higher chr | Higher breakpoint | NTS | Microhomology | Phase II output |
---|---|---|---|---|---|---|---|
4 | 4 | 114974732 | 4 | 115115727 | - | G | Chr.4- 114974733(32)--G--115115728(27) Chr.4- (score 94) |
8 | 6 | 126370389 | 6 | 134006924 | GGCAAATATACTCTT | - | Chr.6- 126370389] GGCAAATATACTCTT [134006924 Chr.6 (score 95) |
32 | 1 | 241559717 | 12 | 132126269 | - | - | Chr.1- 241559717][132126269 Chr.12 (score 90) |
When the assembled reads do not completely match the ends of the reference based target additional notation is added to the Chr.X
elements of the 'Phase II output':
Chr.17 290000(200)--NNN--1350000(44) Chr.17[@17] (score 75)
Chr.3 186000000] NNN [11600000 Chr.16-[@651] (score 94)
This denotes where the divergence occurs, if considered the low end (internally) the value is the point the query starts to match. If the high end (internally) it is the point the query starts to diverge. Scores are adjusted accordingly.