Skip to content

BBYL9413/TDS-CLIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bin Wang, Wentong Li, Wenqian Wang, Mingliang Gao, Runmin Cong, Wei Zhang

Requirement

  • PyTorch >= 1.9
  • RandAugment
  • pprint
  • dotmap
  • yaml
  • einops

Data Preparation

(Recommend) To train all of our models, we extract videos into frames for fast reading. Please refer to mmaction2 repo for the detaied guide of data processing.
The annotation file is a text file with multiple lines, and each line indicates the directory to frames of a video, total frames of the video and the label of a video, which are split with a whitespace. Here is the format:

<video_1> <total frames> <label_1>
<video_2> <total frames> <label_2>
...
<video_N> <total frames> <label_N>

(Optional) We can also decode the videos in an online fashion using decord. This method is recommended if your dataset is not on a high-speed hard drive. Example of annotation:

<video_1> <label_1>
<video_2> <label_2>
...
<video_N> <label_N>

Performance

Model Zoo

Here we provide some off-the-shelf pre-trained checkpoints of our models in the following tables. More checkpoints will be provided soon.

Kinetics-400

Backbone #Frame x crops x clips Top-1 Acc.(%) Download
ViT-B/16 8x3x4 83.9 Log / Checkpoint

Something-Something V1

Backbone #Frame x crops x clips Top-1 Acc.(%) Download
ViT-B/16 8x3x2 60.1 Log / Checkpoint

Something-Something V2

Backbone #Frame x crops x clips Top-1 Acc.(%) Download
ViT-B/16 8x3x2 71.8 Log / Checkpoint
ViT-L/14 8x3x2 73.4 Log / Checkpoint

Train

  • After Data Preparation, you will need to download the CLIP weights from OpenAI, and place them in the clip_pretrain folder.
# For example, fine-tuning on Something-Something V1 using the following command:
sh scripts/run_train_vision.sh configs/sthv1/sthv1_train_rgb_vitb-16-side4video.yaml

Test

  • Run the following command to test the model.
sh scripts/run_test_vision.sh configs/sthv1/sthv1_train_rgb_vitb-16-side4video.yaml exp_onehot/ssv1/model_best.pt --test_crops 3 --test_clips 2

Citation

@article{wangtds-clip,
  title = {TDS-CLIP: Temporal Difference Side Network for Efficient VideoAction Recognition},
  author = {Bin Wang, Wentong Li, Wenqian Wang, Mingliang Gao, Runmin Cong, Wei Zhang},
  journal={arXiv},
  year={2024},
  url = {http://arxiv.org/abs/2408.10688v2}
}

Acknowledgment

We benefit from the codebase of Side4Video.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •