Skoltech3D

This repository describes the structure and format of the Skoltech3D dataset and provides utilities for working with the data.

Multi-sensor large-scale dataset for multi-view 3D reconstruction
Oleg Voynov, Gleb Bobrovskikh, Pavel Karpyshev, Saveliy Galochkin, Andrei-Timotei Ardelean, Arseniy Bozhenko, Ekaterina Karmanova, Pavel Kopanev, Yaroslav Labutin-Rymsho, Ruslan Rakhimov, Aleksandr Safin, Valerii Serpiva, Alexey Artemov, Evgeny Burnaev, Dzmitry Tsetserukou, Denis Zorin
CVPR 2023
Project page | Paper (6 MB) | Supplementary text (2 MB) | Supplementary results (101 MB) | arXiv | Publisher version

Overview

The dataset consists of 107 scenes scanned with

RangeVision Spectrum structured-light scanner,
2x The Imaging Source DFK 33UX250 industrial RGB cameras,
2x Huawei Mate 30 Pro phones with time-of-flight depth sensors,
Intel RealSense D435 stereo RGB-D camera,
and Microsoft Kinect v2 ToF RGB-D camera.

Previews of the scenes are available in the supplementary results for the paper, and the list of scene names can be found at scenes.py:1.

The data for each scene consists of

a highly accurate reference structured-light scan in the form of mesh and depth maps,
RGB and infra-red images and lower-quality depth maps from the cameras, captured from 100 points of view, under 14 lighting conditions,
intrinsic camera parameters and camera positions,
and additional data, such as occluding surface, as described further.

A detailed overview of the provided data is available in the "Skoltech3D data" spreadsheet and is explained below.

Structured-light data

The structured-light (SL) data for each scene consists of

the reference merged and cleaned full structured-light scan,
27 raw partial scans,
5 validation partial scans, if the temporary coating was used for the scene,
and the occluding surface, described in paragraph "Rendering SL scans. Occluding surface" in the paper.

The paths to the files are

Data              | Path
------------------|-----
Full scan         | dataset/{scene}/stl/reconstruction/cleaned.ply
Partial scans     | dataset/{scene}/stl/partial/aligned/{scan_i:04}.ply
Validation scans  | dataset/{scene}/stl/validation/aligned/{val_scan_i:04}.ply
Occluding surface | dataset/{scene}/stl/occluded_space.ply

where

scene is the name of the scene from scenes.py:1,
scan_i is the id of the partial scan from 0..26,
and val_scan_i is the id of the validation partial scan from 0..4.

The structured-light data is stored as triangle meshes in binary PLY format, which can be opened in MeshLab, or loaded using Open3D.

Image data

The image data for each scene consists of the images captured from 100 viewpoints with the right and left The Imaging Source (TIS) cameras, the right and left phone cameras, and the RealSense and Kinect cameras.

The paths to the files are

Data                                     | Path
---------------------------------------- |-----
Undistorted images, lighting-dependent   | dataset/{scene}/{cam}/{mode}/undistorted/{light}/{pos_i:04}.{ext}
Undistorted images, lighting-independent | dataset/{scene}/{cam}/{mode}/undistorted/{pos_i:04}.{ext}
Raw images, lighting-dependent           | raw/{scene}/{cam}/{mode}/raw/{light}/{pos_i:04}.{ext}
Raw images, lighting-independent         | raw/{scene}/{cam}/{mode}/raw/{pos_i:04}.png

where

scene is the name of the scene from scenes.py:1,
cam is the name of the camera from tis_right, tis_left, kinect_v2, real_sense, phone_right, and phone_left,
mode is the name of the modality from rgb, depth, ir, and ir_right,
light is the name of the lighting condition, as explained below,
pos_i is the id of the viewpoint from 0..99,
and ext is png, except for the RGB images from phone_right and phone_left, for which it is jpg.

We captured RGB images with all the devices, depth images with the two phones, the Kinect, and the RealSense, and infra-red (IR) images with the two phones, the Kinect, and the left and right IR sensors of the RealSense.

We captured RGB images with all devices, and all modalities with the RealSense, under 12 lighting conditions, denoted in params.py:1 as {lighting_condition}@best. For two of the lighting conditions, denoted as ambient_low@fast and flash@fast, we additionally captured the images with camera settings corresponding to a high level of noise, as explained in paragraph "Sensor exposure/gain selection" in the paper. For the Kinect, since controlling the camera settings was not possible, the two additional lighting conditions are not available, and the available lighting conditions are labeled without the suffix best of fast, as in params.py:6.

The time-of-flight depth sensors of the Kinect and the phones are unaffected by the lighting conditions, so we captured the depth and IR images for these cameras once per viewpoint.

For each image, we provide an undistorted version and the raw unprocessed version.

The images are stored as either regular PNG with lossless compression, or lossy JPEG in case of RGB images from the phones, or PNG with each RGBA pixel representing a little-endian 32-bit floating point value. Please refer to imgio.py:5 for more details on the format of the images, and to load_imgs.ipynb for a code example for loading the images.

Calibration

The calibration data consists of

camera poses in the global space of the SL data for each scene, after the refinement described in paragraph "Refinement of camera poses" in the paper,
pinhole camera models, corresponding to the undistorted images,
central generic camera models, corresponding to the raw images,
and depth undistortion models, described in paragraph "Refinement of depth camera calibration" in the paper.

The paths to the files are

Data                       | Path
-------------------------- |-----
Camera poses               | dataset/{scene}/{cam}/{mode}/images.txt
Pinhole camera models      | dataset/calibration/{cam}/{mode}/cameras.txt
Central generic cam models | dataset/calibration/{cam}/{mode}/intrinsics.yaml
Depth undistortion models  | dataset/calibration/{cam}/{mode}/undistortion.pt

where

scene is the name of the scene from scenes.py:1,
cam is the name of the camera from tis_right, tis_left, kinect_v2, real_sense, phone_right, and phone_left,
and mode is the name of the modality from rgb, ir, and ir_right.

The camera parameters of the ir sensors correspond to both the IR and the depth images.

The camera poses and the pinhole camera models are stored in the COLMAP text format. The central generic camera models are stored in the format of Thomas Schöps' camera calibration toolbox.

Addons

Alternative calibration formats

For the undistorted RGB images from tis_right camera, we provide the camera parameters in two additional formats:

in MVSNet format,
and in IDR format, commonly used in works on NeRF-based 3D reconstruction.

The paths to the files are

Data                    | Path
----------------------- |-----
MVSNet                  |
  Camera parameters     | addons/{scene}/{cam}/{mode}/mvsnet_input/{pos_i:08}_cam.txt
  View selection scores | addons/{scene}/{cam}/{mode}/mvsnet_input/pair.txt
IDR                     | addons/{scene}/{cam}/{mode}/idr_input/cameras.npz

where

scene is the name of the scene from scenes.py:1,
cam is the name of the camera --- tis_right,
mode is the name of the modality --- rgb,
and pos_i is the id of the viewpoint from 0..99.

In addition to the standard fields in the IDR cameras.npz, we provide the field cam_dict['roi_box_{pos_i}'] = left, right, top, bottom, which represents the bounding box of the object in the image in pixels (precisely, the bounding box of the projection of the SL scan).

Also note that the MVSNet view selection scores in pair.txt, and consequently the neighboring source views for multi-view stereo matching, can be calculated in various ways. We calculated the scores following the algorithm described in paragraph "View Selection" in the MVSNet paper, with the original parameters and using the vertices of the SL scan instead of the SfM reconstruction.

Structured-light depth maps

In addition to the SL scans as triangle meshes, we provide the depth maps rendered from the scans at the image spaces of the undistorted RGB images from both TIS cameras, and the undistorted depth maps from each camera.

The paths to the files are

Data                      | Path
------------------------- |-----
Rendered w/o antialiasing | addons/{scene}/proj_depth/stl.clean_rec@{sensor}.undist/{pos_i:04}.png
        with antialiasing | addons/{scene}/proj_depth/stl.clean_rec.aa@{sensor}.undist/{pos_i:04}.png

where

scene is the name of the scene from scenes.py:1,
sensor is the name of the sensor from tis_right, tis_left, kinect_v2_ir, real_sense_ir, phone_right_ir, and phone_left_ir,
and pos_i is the id of the viewpoint from 0..99.

For each depth map we provide a version rendered without antialiasing in stl.clean_rec@..., and a version rendered with antialiasing in stl.clean_rec.aa@..., which may better correspond to low-quality sensor depth maps.

The images are stored as PNG files with each RGBA pixel representing a little-endian 32-bit floating point value. Please refer to imgio.py:75 for more details on the format, and to load_sl_depth.ipynb for a code example for loading the depth maps.

Other reprojected depth maps

You can also find code examples for rendering an SL scan to an arbitrary image space, and for reprojecting a low-quality sensor depth map to an arbitrary image space in render_sl_depth.ipynb and reproject_depth.ipynb respectively.

ScenePaths

For convenience, we provide a python retriever of paths to the files from the dataset scene_paths.py:1. Please refer to load_imgs.ipynb or load_sl_depth.ipynb for some examples of its usage, and to the column "ScenePaths code" in the "Skoltech3D data" spreadsheet for the correspondence between the pieces of the dataset and the methods of this class.

Download

The dataset is split into ZIP archives of sizes up to 25 GB based on the sensor, data modality, the scene, etc. Please refer to the column "Chunk file" in the "Skoltech3D data" spreadsheet (or this column for addons) for the correspondence between the pieces of the dataset and the names of the ZIP files.

To download specific parts of the dataset you can generate the list of download links using make_links.py with a config file and then download the files, for example using wget:

export PYTHONPATH="../sk3d_data/src:$PYTHONPATH"  # where ../sk3d_data is the path to the root of the repository

# Prints the list of links to stdout
src/sk3d/data/make_links.py --conf conf.tsv

# Downloads the files using wget
src/sk3d/data/make_links.py --conf conf.tsv | xargs -n 1 sh -c 'wget --no-check-certificate $0'

See an example of the config file and the description of the format in conf.tsv. You can select a server to download the data from with the --server parameter of make_links.py: see make_links.py -h for the available options. We recommend using the default server for the best download speed.

Alternatively, you can download the files manually using a slower server from the skoltech3d.appliedai.tech/data page.

Evaluation

To evaluate a 3D reconstruction method on Skoltech3D please follow the instructions on the evaluation page.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
examples		examples
src/sk3d		src/sk3d
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Skoltech3D

Table of contents

Overview

Structured-light data

Image data

Calibration

Addons

Alternative calibration formats

Structured-light depth maps

Other reprojected depth maps

ScenePaths

Download

Evaluation

Changelog

v1.1.0, 2023 Dec 30

v1.0.0, 2023 Aug 16

About

Uh oh!

Languages

License

Skoltech-3D/sk3d_data

Folders and files

Latest commit

History

Repository files navigation

Skoltech3D

Table of contents

Overview

Structured-light data

Image data

Calibration

Addons

Alternative calibration formats

Structured-light depth maps

Other reprojected depth maps

ScenePaths

Download

Evaluation

Changelog

v1.1.0, 2023 Dec 30

v1.0.0, 2023 Aug 16

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages