GitHub - BBYL9413/TDS-CLIP

TDS-CLIP: Temporal Difference Side Network for Efficient VideoAction Recognition

Bin Wang, Wentong Li, Wenqian Wang, Mingliang Gao, Runmin Cong, Wei Zhang

Requirement

PyTorch >= 1.9
RandAugment
pprint
dotmap
yaml
einops

Data Preparation

(Recommend) To train all of our models, we extract videos into frames for fast reading. Please refer to mmaction2 repo for the detaied guide of data processing.
The annotation file is a text file with multiple lines, and each line indicates the directory to frames of a video, total frames of the video and the label of a video, which are split with a whitespace. Here is the format:

<video_1> <total frames> <label_1>
<video_2> <total frames> <label_2>
...
<video_N> <total frames> <label_N>

(Optional) We can also decode the videos in an online fashion using decord. This method is recommended if your dataset is not on a high-speed hard drive. Example of annotation:

<video_1> <label_1>
<video_2> <label_2>
...
<video_N> <label_N>

Performance

Model Zoo

Here we provide some off-the-shelf pre-trained checkpoints of our models in the following tables. More checkpoints will be provided soon.

Kinetics-400

Backbone	#Frame x crops x clips	Top-1 Acc.(%)	Download
ViT-B/16	8x3x4	83.9	Log / Checkpoint

Something-Something V1

Backbone	#Frame x crops x clips	Top-1 Acc.(%)	Download
ViT-B/16	8x3x2	60.1	Log / Checkpoint

Something-Something V2

Backbone	#Frame x crops x clips	Top-1 Acc.(%)	Download
ViT-B/16	8x3x2	71.8	Log / Checkpoint
ViT-L/14	8x3x2	73.4	Log / Checkpoint

Train

After Data Preparation, you will need to download the CLIP weights from OpenAI, and place them in the clip_pretrain folder.

# For example, fine-tuning on Something-Something V1 using the following command:
sh scripts/run_train_vision.sh configs/sthv1/sthv1_train_rgb_vitb-16-side4video.yaml

Test

Run the following command to test the model.

sh scripts/run_test_vision.sh configs/sthv1/sthv1_train_rgb_vitb-16-side4video.yaml exp_onehot/ssv1/model_best.pt --test_crops 3 --test_clips 2

Citation

@article{wangtds-clip,
  title = {TDS-CLIP: Temporal Difference Side Network for Efficient VideoAction Recognition},
  author = {Bin Wang, Wentong Li, Wenqian Wang, Mingliang Gao, Runmin Cong, Wei Zhang},
  journal={arXiv},
  year={2024},
  url = {http://arxiv.org/abs/2408.10688v2}
}

Acknowledgment

We benefit from the codebase of Side4Video.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
clip		clip
configs		configs
datasets		datasets
lists		lists
modules		modules
scripts		scripts
utils		utils
README.md		README.md
test_vision.py		test_vision.py
train_vision.py		train_vision.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TDS-CLIP: Temporal Difference Side Network for Efficient VideoAction Recognition

Requirement

Data Preparation

Performance

Model Zoo

Kinetics-400

Something-Something V1

Something-Something V2

Train

Test

Citation

Acknowledgment

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

BBYL9413/TDS-CLIP

Folders and files

Latest commit

History

Repository files navigation

TDS-CLIP: Temporal Difference Side Network for Efficient VideoAction Recognition

Requirement

Data Preparation

Performance

Model Zoo

Kinetics-400

Something-Something V1

Something-Something V2

Train

Test

Citation

Acknowledgment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages