Semantic Scene Completion (SSC) constitutes a pivotal element in autonomous driving perception systems, tasked with inferring the 3D semantic occupancy of a scene from sensory data. To improve accuracy, prior research has implemented various computationally demanding and memory- intensive 3D operations, imposing significant computational requirements on the platform during training and testing. This paper proposes L2COcc, a lightweight camera-centric SSC framework that also accommodates LiDAR inputs. With our proposed efficient voxel transformer (EVT) and cross-modal knowledge modules, including feature similarity distillation (FSD), TPV distillation (TPVD) and prediction alignment dis- tillation (PAD), our method substantially reduce computational burden while maintaining high accuracy. The experimental evaluations demonstrate that our proposed method surpasses the current state-of-the-art vision-based SSC methods regarding accuracy on both the SemanticKITTI and SSCBench-KITTI- 360 benchmarks, respectively. Additionally, our method is more lightweight, exhibiting a reduction in both memory consump- tion and inference time by over 23% compared to the current state-of-the-arts method.
More video demonstrations can be found at the project page.
The overall framework of our proposed L2COcc, comprised of three stages: voxel feature generation (indicated by the gray background), TPV-based occupancy prediction network (indicated by the blue background), and cross-modal knowledge distillation (indicated by the green background).
step 1. Refer to install.md to install the environment.
step 2. Refer to dataset.md to prepare SemanticKITTI and KITTI360 dataset.
step 3. Refer to train_and_eval.md for training and evaluation.
step 4. Refer to visualization.md for visualizations.
-
SemanticKITTI
Model Sensor Split IoU mIoU Download L2COcc-C Camera val 45.56 16.72 ckpt/log L2COcc-C Camera test 44.31 17.03 output L2COcc-D Camera val 45.30 18.22 ckpt/log L2COcc-D Camera test 45.37 18.18 output L2COcc-L LiDAR val 60.66 24.21 ckpt/log L2COcc-L LiDAR test 60.32 23.37 output -
KITTI360
Model Sensor Split IoU mIoU Download L2COcc-C Camera test 48.07 20.11 ckpt/log L2COcc-D Camera test 48.83 20.99 ckpt/log L2COcc-L LiDAR test 57.60 25.22 ckpt/log
Many thanks to these exceptional open source projects:
- CGFormer
- LiCROcc
- PointOcc
- BEVFormer
- mmdet3d
- MonoScene
- semantic-kitti-api
- MobileStereoNet
- Symphonize
- DFA3D
- VoxFormer
If you find our work beneficial for your research, please consider citing our paper and give us a star:
@misc{wang2025l2cocclightweightcameracentricsemantic,
title={L2COcc: Lightweight Camera-Centric Semantic Scene Completion via Distillation of LiDAR Model},
author={Ruoyu Wang and Yukai Ma and Yi Yao and Sheng Tao and Haoang Li and Zongzhi Zhu and Yong Liu and Xingxing Zuo},
year={2025},
eprint={2503.12369},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.12369},
}
If you encounter any issues, please contact samuraiwry@gmail.com.