Skip to content

Creating .hic and .assembly for editing in juicebox #4

@Astahlke

Description

@Astahlke

Hi Chenxi,

It looks like yahs will really speed up our scaffolding efforts - so far the scaffolded fastas are looking great. Awesome work! However, I'm having trouble creating the correct input files for our manual curation phase, editing the scaffolds in Juicebox. Our main goal is to relate the underlying contigs (especially when we use the yahs flag --no-contig-ec) to the assembled scaffolds and hic map.

Using your provided juicebox_pre program and the Juicebox juicer pre, the resulting .hic and .assembly files are not correctly editable in Juicebox. I think SALSA users are having a similar issue marbl/SALSA#154

I think you can only create a draft assembly for editing with run-assembly-visualizer.sh (https://github.com/aidenlab/3d-dna/blob/master/visualize/run-asm-visualizer.sh).
Normally our workflow looks like this. makeAgpFromFasta and agp2assembly.py are 3d-dna scripts. Matlock is provided by Phase Genomics - similar concept to your juicebox_pre to convert the alignments to alignments_sorted.txt.

FA=yahs.out_scaffolds_final.fa
makeAgpFromFasta.py $FA genome.agp
agp2assembly.py genome.agp genome.assembly
bwa index $FA
bwa mem -5SP $FA *R1*fastq.gz *R2*fastq.gz | samblaster | samtools view -S -h -b -F 2316 > phasehic.aligned.bam
matlock bam2 juicer phasehic.aligned.bam phasehic.links.txt
sort -k2,2 -k6,6 phasehic.links.txt > phasehic.sorted.links.txt
run-assembly-visualizer.sh -p false genome.assembly phasehic.sorted.links.txt

It seems like I should be able to substitute the alignments_sorted.txt for phasehic.sorted.links.txt in our workflow, but
alignments_sorted.txt is missing some columns. Maybe we just need to figure out how to fill these columns?

[amanda.stahlke@ceres yahs]$ head alignments_sorted.txt
0	scaffold_1	100002074	0	1	scaffold_1	143542799	1
0	scaffold_1	1000042	0	1	scaffold_1	1000223	1
0	scaffold_1	100004260	0	1	scaffold_1	100004229	1
0	scaffold_1	100004310	0	1	scaffold_1	100004310	1
[amanda.stahlke@ceres yahs]$ head phasehic.sorted.links
0 scaffold_1 100002220 0 16 scaffold_1 100002439 1 1 - - 1  - - -
0 scaffold_1 100002430 0 16 scaffold_1 100002441 1 1 - - 1  - - -
0 scaffold_1 100002827 0 16 scaffold_1 96197983 1 1 - - 1  - - -
0 scaffold_1 100002871 0 16 scaffold_1 100003104 1 1 - - 1  - - -

Not sure if this is a very clear question. In short, can you provide any guidance on creating a .assembly and .hic file for the Juicebox run-assembly-visualizer.sh tool?

Thank you!
Amanda

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions