-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Hi Chenxi,
It looks like yahs will really speed up our scaffolding efforts - so far the scaffolded fastas are looking great. Awesome work! However, I'm having trouble creating the correct input files for our manual curation phase, editing the scaffolds in Juicebox. Our main goal is to relate the underlying contigs (especially when we use the yahs flag --no-contig-ec) to the assembled scaffolds and hic map.
Using your provided juicebox_pre
program and the Juicebox juicer pre
, the resulting .hic and .assembly files are not correctly editable in Juicebox. I think SALSA users are having a similar issue marbl/SALSA#154
I think you can only create a draft assembly for editing with run-assembly-visualizer.sh
(https://github.com/aidenlab/3d-dna/blob/master/visualize/run-asm-visualizer.sh).
Normally our workflow looks like this. makeAgpFromFasta and agp2assembly.py are 3d-dna scripts. Matlock is provided by Phase Genomics - similar concept to your juicebox_pre to convert the alignments to alignments_sorted.txt.
FA=yahs.out_scaffolds_final.fa
makeAgpFromFasta.py $FA genome.agp
agp2assembly.py genome.agp genome.assembly
bwa index $FA
bwa mem -5SP $FA *R1*fastq.gz *R2*fastq.gz | samblaster | samtools view -S -h -b -F 2316 > phasehic.aligned.bam
matlock bam2 juicer phasehic.aligned.bam phasehic.links.txt
sort -k2,2 -k6,6 phasehic.links.txt > phasehic.sorted.links.txt
run-assembly-visualizer.sh -p false genome.assembly phasehic.sorted.links.txt
It seems like I should be able to substitute the alignments_sorted.txt for phasehic.sorted.links.txt in our workflow, but
alignments_sorted.txt is missing some columns. Maybe we just need to figure out how to fill these columns?
[amanda.stahlke@ceres yahs]$ head alignments_sorted.txt
0 scaffold_1 100002074 0 1 scaffold_1 143542799 1
0 scaffold_1 1000042 0 1 scaffold_1 1000223 1
0 scaffold_1 100004260 0 1 scaffold_1 100004229 1
0 scaffold_1 100004310 0 1 scaffold_1 100004310 1
[amanda.stahlke@ceres yahs]$ head phasehic.sorted.links
0 scaffold_1 100002220 0 16 scaffold_1 100002439 1 1 - - 1 - - -
0 scaffold_1 100002430 0 16 scaffold_1 100002441 1 1 - - 1 - - -
0 scaffold_1 100002827 0 16 scaffold_1 96197983 1 1 - - 1 - - -
0 scaffold_1 100002871 0 16 scaffold_1 100003104 1 1 - - 1 - - -
Not sure if this is a very clear question. In short, can you provide any guidance on creating a .assembly and .hic file for the Juicebox run-assembly-visualizer.sh tool?
Thank you!
Amanda