MsSVT: Mixed-scale Sparse Voxel Transformer for 3D Object Detection on Point Clouds

Thanks for the OpenPCDet, we implement our method based on this toolbox.

3D object detection from the LiDAR point cloud is fundamental to autonomous driving. Large-scale outdoor scenes usually feature significant variance in instance scales, thus requiring features rich in long-range and fine-grained information to support accurate detection. Recent detectors leverage the power of window-based transformers to model long-range dependencies but tend to blur out fine-grained details. To mitigate this gap, we present a novel Mixed-scale Sparse Voxel Transformer, named MsSVT, which can well capture both types of information simultaneously by the divide-and-conquer philosophy. Specifically, MsSVT explicitly divides attention heads into multiple groups, each in charge of attending to information within a particular range. All groups' output is merged to obtain the final mixed-scale features. Moreover, we provide a novel chessboard sampling strategy to reduce the computational complexity of applying a window-based transformer in 3D voxel space. To improve efficiency, we also implement the voxel sampling and gathering operations sparsely with a hash map.

1. Environment

Python 3.7+
PyTorch 1.6+
CUDA 10.1 or higher
pcdet (The installation of MsSVT is same as openpcdet)

2. Data

Prepare Waymo dataset

# Download Waymo and organize it into the following form:
├── data
│   ├── waymo
│   │   │── ImageSets
│   │   │── raw_data
│   │   │   │── segment-xxxxxxxx.tfrecord
|   |   |   |── ...
|   |   |── waymo_processed_data
│   │   │   │── segment-xxxxxxxx/
|   |   |   |── ...
│   │   │── pcdet_gt_database_train_sampled_xx/
│   │   │── pcdet_waymo_dbinfos_train_sampled_xx.pkl

# Install tf 2.1.0
# Install the official waymo-open-dataset by running the following command:
pip3 install --upgrade pip
pip3 install waymo-open-dataset-tf-2-1-0 --user

# Obtain point cloud data and generate data infos:
python -m pcdet.datasets.waymo.waymo_dataset --func create_waymo_infos --cfg_file tools/cfgs/dataset_configs/waymo_dataset.yaml

3. Train

Train with single GPU

python train.py --cfg_file ${CONFIG_FILE}

# for mssvt,
python train.py --cfg_file cfgs/waymo_models/mssvt.yaml --extra_tag ${TAG} --workers 8

Train with multiple GPUs or multiple machines

CUDA_VISIBLE_DEVICES=${GPUS} bash scripts/dist_train.sh ${NUM_GPUS} ${PORT} --cfg_file ${CONFIG_FILE}
# or 
CUDA_VISIBLE_DEVICES=${GPUS} bash scripts/slurm_train.sh ${PARTITION} ${JOB_NAME} ${NUM_GPUS} --cfg_file ${CONFIG_FILE}

# for mssvt,
CUDA_VISIBLE_DEVICES=${GPUS} bash ./scripts/dist_train.sh ${NUM_GPUS} ${PORT} --cfg_file cfgs/waymo_models/mssvt.yaml --extra_tag ${TAG} --sync_bn --workers 8

4. Test

Test with a pretrained model:

CUDA_VISIBLE_DEVICES=${GPUS} bash ./scripts/dist_test.sh ${NUM_GPUS} ${PORT} --cfg_file ${CONFIG_FILE}

# for mssvt,
CUDA_VISIBLE_DEVICES=${GPUS} bash ./scripts/dist_test.sh ${NUM_GPUS} ${PORT} --cfg_file cfgs/waymo_models/mssvt.yaml --extra_tag ${TAG} --ckpt ${CKPT_PATH} --workers 8

Citation

@inproceedings{dong2022mssvt,
  title={MsSVT: Mixed-scale Sparse Voxel Transformer for 3D Object Detection on Point Clouds},
  author={Dong, Shaocong and Ding, Lihe and Wang, Haiyang and Xu, Tingfa and Xu, Xinli and Wang, Jie and Bian, Ziyang and Wang, Ying and Li, Jianan},
  booktitle={NeurIPS},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
pcdet		pcdet
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MsSVT: Mixed-scale Sparse Voxel Transformer for 3D Object Detection on Point Clouds

1. Environment

2. Data

3. Train

4. Test

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

dscdyc/MsSVT

Folders and files

Latest commit

History

Repository files navigation

MsSVT: Mixed-scale Sparse Voxel Transformer for 3D Object Detection on Point Clouds

1. Environment

2. Data

3. Train

4. Test

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages