Bin Wang, Wentong Li, Wenqian Wang, Mingliang Gao, Runmin Cong, Wei Zhang
- PyTorch >= 1.9
- RandAugment
- pprint
- dotmap
- yaml
- einops
(Recommend) To train all of our models, we extract videos into frames for fast reading. Please refer to mmaction2 repo for the detaied guide of data processing.
The annotation file is a text file with multiple lines, and each line indicates the directory to frames of a video, total frames of the video and the label of a video, which are split with a whitespace. Here is the format:
<video_1> <total frames> <label_1>
<video_2> <total frames> <label_2>
...
<video_N> <total frames> <label_N>
(Optional) We can also decode the videos in an online fashion using decord. This method is recommended if your dataset is not on a high-speed hard drive. Example of annotation:
<video_1> <label_1>
<video_2> <label_2>
...
<video_N> <label_N>
Here we provide some off-the-shelf pre-trained checkpoints of our models in the following tables. More checkpoints will be provided soon.
Backbone | #Frame x crops x clips | Top-1 Acc.(%) | Download |
---|---|---|---|
ViT-B/16 | 8x3x4 | 83.9 | Log / Checkpoint |
Backbone | #Frame x crops x clips | Top-1 Acc.(%) | Download |
---|---|---|---|
ViT-B/16 | 8x3x2 | 60.1 | Log / Checkpoint |
Backbone | #Frame x crops x clips | Top-1 Acc.(%) | Download |
---|---|---|---|
ViT-B/16 | 8x3x2 | 71.8 | Log / Checkpoint |
ViT-L/14 | 8x3x2 | 73.4 | Log / Checkpoint |
- After Data Preparation, you will need to download the CLIP weights from OpenAI, and place them in the
clip_pretrain
folder.
# For example, fine-tuning on Something-Something V1 using the following command:
sh scripts/run_train_vision.sh configs/sthv1/sthv1_train_rgb_vitb-16-side4video.yaml
- Run the following command to test the model.
sh scripts/run_test_vision.sh configs/sthv1/sthv1_train_rgb_vitb-16-side4video.yaml exp_onehot/ssv1/model_best.pt --test_crops 3 --test_clips 2
@article{wangtds-clip,
title = {TDS-CLIP: Temporal Difference Side Network for Efficient VideoAction Recognition},
author = {Bin Wang, Wentong Li, Wenqian Wang, Mingliang Gao, Runmin Cong, Wei Zhang},
journal={arXiv},
year={2024},
url = {http://arxiv.org/abs/2408.10688v2}
}
We benefit from the codebase of Side4Video.