Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation
Ke Fan*, Jingshi Lei*, Xuelin Qian†, Miaopeng Yu, Tianjun Xiao†, Tong He, Zheng Zhang, Yanwei Fu
EoRaS (Efficient object-centric Representation amodal Segmentation) is a framework designed for supervised video amodal segementation. The videos are first encoded by a convolutional neural network to get the front-view features. Then a translation module is used to project front-view features into the Bird’s-Eye View (BEV), which introduces 3D information to improve current feature quality. The front-view features and BEV features across the frames are integrated by a multi-view fusion layer based temporal module which is equipped with a set of object slots and interacts with features from different views by attention mechanism. Finally, the integrated front-view features are decoded into the visible and amodal masks.
We release the codes, datasets and checkpoints of EoRaS here
If you find our paper useful for your research and applications, please cite using this BibTeX:
@InProceedings{Fan_2023_ICCV,
author = {Fan, Ke and Lei, Jingshi and Qian, Xuelin and Yu, Miaopeng and Xiao, Tianjun and He, Tong and Zhang, Zheng and Fu, Yanwei},
title = {Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {1272-1281}
}