This repository describes the structure and format of the Skoltech3D dataset and provides utilities for working with the data.
Multi-sensor large-scale dataset for multi-view 3D reconstruction
Oleg Voynov, Gleb Bobrovskikh, Pavel Karpyshev, Saveliy Galochkin, Andrei-Timotei Ardelean, Arseniy Bozhenko, Ekaterina Karmanova, Pavel Kopanev, Yaroslav Labutin-Rymsho, Ruslan Rakhimov, Aleksandr Safin, Valerii Serpiva, Alexey Artemov, Evgeny Burnaev, Dzmitry Tsetserukou, Denis Zorin
CVPR 2023
Project page | Paper (6 MB) | Supplementary text (2 MB) | Supplementary results (101 MB) | arXiv | Publisher version
- Overview
- Structured-light data
- Image data
- Calibration
- Addons
- ScenePaths
- Download
- Evaluation
- Changelog
The dataset consists of 107 scenes scanned with
- RangeVision Spectrum structured-light scanner,
- 2x The Imaging Source DFK 33UX250 industrial RGB cameras,
- 2x Huawei Mate 30 Pro phones with time-of-flight depth sensors,
- Intel RealSense D435 stereo RGB-D camera,
- and Microsoft Kinect v2 ToF RGB-D camera.
Previews of the scenes are available in the supplementary results
for the paper, and the list of scene names can be found at scenes.py:1
.
The data for each scene consists of
- a highly accurate reference structured-light scan in the form of mesh and depth maps,
- RGB and infra-red images and lower-quality depth maps from the cameras, captured from 100 points of view, under 14 lighting conditions,
- intrinsic camera parameters and camera positions,
- and additional data, such as occluding surface, as described further.
A detailed overview of the provided data is available in the "Skoltech3D data" spreadsheet and is explained below.
The structured-light (SL) data for each scene consists of
- the reference merged and cleaned full structured-light scan,
- 27 raw partial scans,
- 5 validation partial scans, if the temporary coating was used for the scene,
- and the occluding surface, described in paragraph "Rendering SL scans. Occluding surface" in the paper.
The paths to the files are
Data | Path
------------------|-----
Full scan | dataset/{scene}/stl/reconstruction/cleaned.ply
Partial scans | dataset/{scene}/stl/partial/aligned/{scan_i:04}.ply
Validation scans | dataset/{scene}/stl/validation/aligned/{val_scan_i:04}.ply
Occluding surface | dataset/{scene}/stl/occluded_space.ply
where
scene
is the name of the scene fromscenes.py:1
,scan_i
is the id of the partial scan from0..26
,- and
val_scan_i
is the id of the validation partial scan from0..4
.
The structured-light data is stored as triangle meshes in binary PLY format, which can be opened in MeshLab, or loaded using Open3D.
The image data for each scene consists of the images captured from 100 viewpoints with the right and left The Imaging Source (TIS) cameras, the right and left phone cameras, and the RealSense and Kinect cameras.
The paths to the files are
Data | Path
---------------------------------------- |-----
Undistorted images, lighting-dependent | dataset/{scene}/{cam}/{mode}/undistorted/{light}/{pos_i:04}.{ext}
Undistorted images, lighting-independent | dataset/{scene}/{cam}/{mode}/undistorted/{pos_i:04}.{ext}
Raw images, lighting-dependent | raw/{scene}/{cam}/{mode}/raw/{light}/{pos_i:04}.{ext}
Raw images, lighting-independent | raw/{scene}/{cam}/{mode}/raw/{pos_i:04}.png
where
scene
is the name of the scene fromscenes.py:1
,cam
is the name of the camera fromtis_right
,tis_left
,kinect_v2
,real_sense
,phone_right
, andphone_left
,mode
is the name of the modality fromrgb
,depth
,ir
, andir_right
,light
is the name of the lighting condition, as explained below,pos_i
is the id of the viewpoint from0..99
,- and
ext
ispng
, except for the RGB images fromphone_right
andphone_left
, for which it isjpg
.
We captured RGB images with all the devices, depth images with the two phones, the Kinect, and the RealSense, and infra-red (IR) images with the two phones, the Kinect, and the left and right IR sensors of the RealSense.
We captured RGB images with all devices, and all modalities with the RealSense,
under 12 lighting conditions, denoted in params.py:1
as {lighting_condition}@best
.
For two of the lighting conditions, denoted as ambient_low@fast
and flash@fast
,
we additionally captured the images with camera settings corresponding to a high level of noise,
as explained in paragraph
"Sensor exposure/gain selection"
in the paper.
For the Kinect, since controlling the camera settings was not possible,
the two additional lighting conditions are not available,
and the available lighting conditions are labeled without the suffix best
of fast
,
as in params.py:6
.
The time-of-flight depth sensors of the Kinect and the phones are unaffected by the lighting conditions, so we captured the depth and IR images for these cameras once per viewpoint.
For each image, we provide an undistorted version and the raw unprocessed version.
The images are stored as either regular PNG with lossless compression,
or lossy JPEG in case of RGB images from the phones,
or PNG with each RGBA pixel representing a little-endian 32-bit floating point value.
Please refer to imgio.py:5
for more details on the format of the images,
and to load_imgs.ipynb
for a code example for loading the images.
The calibration data consists of
- camera poses in the global space of the SL data for each scene, after the refinement described in paragraph "Refinement of camera poses" in the paper,
- pinhole camera models, corresponding to the undistorted images,
- central generic camera models, corresponding to the raw images,
- and depth undistortion models, described in paragraph "Refinement of depth camera calibration" in the paper.
The paths to the files are
Data | Path
-------------------------- |-----
Camera poses | dataset/{scene}/{cam}/{mode}/images.txt
Pinhole camera models | dataset/calibration/{cam}/{mode}/cameras.txt
Central generic cam models | dataset/calibration/{cam}/{mode}/intrinsics.yaml
Depth undistortion models | dataset/calibration/{cam}/{mode}/undistortion.pt
where
scene
is the name of the scene fromscenes.py:1
,cam
is the name of the camera fromtis_right
,tis_left
,kinect_v2
,real_sense
,phone_right
, andphone_left
,- and
mode
is the name of the modality fromrgb
,ir
, andir_right
.
The camera parameters of the ir
sensors correspond to both the IR and the depth images.
The camera poses and the pinhole camera models are stored in the COLMAP text format. The central generic camera models are stored in the format of Thomas Schöps' camera calibration toolbox.
For the undistorted RGB images from tis_right
camera,
we provide the camera parameters in two additional formats:
- in MVSNet format,
- and in IDR format, commonly used in works on NeRF-based 3D reconstruction.
The paths to the files are
Data | Path
----------------------- |-----
MVSNet |
Camera parameters | addons/{scene}/{cam}/{mode}/mvsnet_input/{pos_i:08}_cam.txt
View selection scores | addons/{scene}/{cam}/{mode}/mvsnet_input/pair.txt
IDR | addons/{scene}/{cam}/{mode}/idr_input/cameras.npz
where
scene
is the name of the scene fromscenes.py:1
,cam
is the name of the camera ---tis_right
,mode
is the name of the modality ---rgb
,- and
pos_i
is the id of the viewpoint from0..99
.
In addition to the standard fields in the IDR cameras.npz
,
we provide the field cam_dict['roi_box_{pos_i}'] = left, right, top, bottom
,
which represents the bounding box of the object in the image in pixels
(precisely, the bounding box of the projection of the SL scan).
Also note that the MVSNet view selection scores in pair.txt
,
and consequently the neighboring source views for multi-view stereo matching,
can be calculated in various ways.
We calculated the scores following the algorithm described in paragraph
"View Selection" in the MVSNet paper,
with the original parameters and using the vertices of the SL scan instead of the SfM reconstruction.
In addition to the SL scans as triangle meshes, we provide the depth maps rendered from the scans at the image spaces of the undistorted RGB images from both TIS cameras, and the undistorted depth maps from each camera.
The paths to the files are
Data | Path
------------------------- |-----
Rendered w/o antialiasing | addons/{scene}/proj_depth/stl.clean_rec@{sensor}.undist/{pos_i:04}.png
with antialiasing | addons/{scene}/proj_depth/stl.clean_rec.aa@{sensor}.undist/{pos_i:04}.png
where
scene
is the name of the scene fromscenes.py:1
,sensor
is the name of the sensor fromtis_right
,tis_left
,kinect_v2_ir
,real_sense_ir
,phone_right_ir
, andphone_left_ir
,- and
pos_i
is the id of the viewpoint from0..99
.
For each depth map we provide a version rendered without antialiasing in stl.clean_rec@...
,
and a version rendered with antialiasing in stl.clean_rec.aa@...
,
which may better correspond to low-quality sensor depth maps.
The images are stored as PNG files with each RGBA pixel representing a little-endian 32-bit floating point value.
Please refer to imgio.py:75
for more details on the format,
and to load_sl_depth.ipynb
for a code example for loading the depth maps.
You can also find code examples for rendering an SL scan to an arbitrary image space,
and for reprojecting a low-quality sensor depth map to an arbitrary image space
in render_sl_depth.ipynb
and reproject_depth.ipynb
respectively.
For convenience, we provide a python retriever of paths to the files from the dataset scene_paths.py:1
.
Please refer to load_imgs.ipynb
or load_sl_depth.ipynb
for some examples of its usage,
and to the column "ScenePaths code"
in the "Skoltech3D data" spreadsheet for the correspondence between the pieces of the dataset and the methods of this class.
The dataset is split into ZIP archives of sizes up to 25 GB based on the sensor, data modality, the scene, etc. Please refer to the column "Chunk file" in the "Skoltech3D data" spreadsheet (or this column for addons) for the correspondence between the pieces of the dataset and the names of the ZIP files.
To download specific parts of the dataset you can generate the list of download links using
make_links.py
with a config file and then download the files, for example using wget
:
export PYTHONPATH="../sk3d_data/src:$PYTHONPATH" # where ../sk3d_data is the path to the root of the repository
# Prints the list of links to stdout
src/sk3d/data/make_links.py --conf conf.tsv
# Downloads the files using wget
src/sk3d/data/make_links.py --conf conf.tsv | xargs -n 1 sh -c 'wget --no-check-certificate $0'
See an example of the config file and the description of the format in conf.tsv
.
You can select a server to download the data from with the --server
parameter of make_links.py
:
see make_links.py -h
for the available options.
We recommend using the default server for the best download speed.
Alternatively, you can download the files manually using a slower server from the skoltech3d.appliedai.tech/data page.
To evaluate a 3D reconstruction method on Skoltech3D please follow the instructions on the evaluation page.
- Added the script for generation of download links.
- Initial release of the documentation.