n5-spark

A small library for processing N5 datasets in parallel using Apache Spark cluster.

Supported operations:

downsampling (isotropic/non-isotropic)
max intensity projection
conversion to TIFF series
parallel remove

Usage

Clone the repository with submodules:

git clone --recursive https://github.com/saalfeldlab/stitching-spark.git

If you have already cloned the repository, run this after cloning to fetch the submodules:

git submodule update --init --recursive

To use as a standalone tool, compile the package for the desired execution environment:

Compile for running on Janelia cluster

python build.py

Compile for running on local machine

python build-spark-local.py

The scripts for starting the application are located under startup-scripts/spark-janelia and startup-scripts/spark-local, and their usage is explained below.

If running locally, you can access the Spark job tracker at http://localhost:4040/ to monitor the progress of the tasks.

N5 downsampling

Run on Janelia cluster

spark-janelia/n5-downsample.py 
<number of cluster nodes> 
-n <path to n5 root> 
-i <input dataset> 
[-r <pixel resolution>]

Run on local machine

spark-local/n5-downsample.py 
-n <path to n5 root> 
-i <input dataset> 
[-r <pixel resolution>]

The tool generates lower resolution datasets in the same group with the input dataset until the resulting volume fits into a single block. The namin scheme for the lower resolution datasets is s1, s2, s3 and so on.
By default the downsampling factors are powers of two ([2,2,2],[4,4,4],[8,8,8],...). If the optional pixel resolution parameter is passed (e.g. -r 0.097,0.097,0.18), the downsampling factors in Z are adjusted with respect to it to make lower resolutions as close to isotropic as possible.
The block size of the input dataset is reused, or adjusted with respect to the pixel resolution if the optional parameter is supplied. The used downsampling factors are written into the attributes metadata of the lower resolution datasets.

N5 to slice TIFF series converter

Run on Janelia cluster

spark-janelia/n5-slice-tiff.py 
<number of cluster nodes> 
-n <path to n5 root> 
-i <input dataset> 
-o <output path> 
[-c <tiff compression>]

Run on local machine

spark-local/n5-slice-tiff.py 
-n <path to n5 root> 
-i <input dataset> 
-o <output path> 
[-c <tiff compression>]

The tool converts a given dataset into slice TIFF series and saves them in the specified output folder.
The following TIFF compression modes are supported: -c lzw and -c none.

N5 max intensity projection

Run on Janelia cluster

spark-janelia/n5-mips.py 
<number of cluster nodes> 
-n <path to n5 root> 
-i <input dataset> 
-o <output path> 
[-c <tiff compression>]
[-m <mip step>]

Run on local machine

spark-local/n5-mips.py 
-n <path to n5 root> 
-i <input dataset> 
-o <output path> 
[-c <tiff compression>]
[-m <mip step>]

The tool generates max intensity projections in X/Y/Z directions and saves them as TIFF images in the specified output folder.
By default the entire volume is used to create a single MIP in X/Y/Z. You can specify MIP step as a number of cells included in a single MIP (e.g. -m 5,5,3).
The following TIFF compression modes are supported: -c lzw and -c none.

N5 remove

Run on Janelia cluster

spark-janelia/n5-remove.py 
<number of cluster nodes> 
-n <path to n5 root> 
-i <input dataset or group>

Run on local machine

spark-local/n5-remove.py 
-n <path to n5 root> 
-i <input dataset or group>

The tool removes a group or dataset parallelizing over inner groups. This is typically much faster than deleting the group on a single machine, in particular when removing groups with many nested groups and/or n5 blocks.

You can alternatively use the library in your Spark-based project. Add a maven dependency and make sure that your application is set to be compiled as a fat jar.

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
src		src
startup-scripts		startup-scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
build-spark-local.py		build-spark-local.py
build.py		build.py
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

n5-spark

Usage

N5 downsampling

N5 to slice TIFF series converter

N5 max intensity projection

N5 remove

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors 9

Uh oh!

Languages

License

Licenses found

saalfeldlab/n5-spark

Folders and files

Latest commit

History

Repository files navigation

n5-spark

Usage

N5 downsampling

N5 to slice TIFF series converter

N5 max intensity projection

N5 remove

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 9

Uh oh!

Languages

Packages