Skip to content

damo-cv/RealisDance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RealisDance-DiT: Simple yet Strong Baseline towards Controllable Character Animation in the Wild

This repository is the official implementation of RealisDance-DiT. RealisDance-DiT is a structurally simple, empirically robust, and experimentally strong baseline model for controllable character animation in the wild.

News

  • 2025-05-19: Released the inference code and weights of RealisDance-DiT.
  • 2025-05-19: You may also be interested in our Uni3C.
  • 2024-10-15: Released RealisDance and the code of pose preparation for RealisDance.
  • 2024-09-10: Now you can try more interesting AI video editing in XunGuang.
  • 2024-09-09: You may also be interested in our human part repair method RealisHuman.

Gallery

Here are several character animation generated by RealisDance-DiT. Note that the GIFs shown here have some degree of visual quality degradation. Please visit our project page for more original videos

TODO List

  • Inference code
  • Model checkpoints
  • RealisDance-Val dataset
  • TeaCache speedup
  • FSDP + Sequential parallel
  • Pose paration code
  • SMPL retargeting
  • New checkpoints with shifted RoPE on frame

Note: This released project has two slight differences between the paper.

  1. We only use SMPL-CS and HaMer in this version in order to support Uni3C. Because it is difficult to align and render DWPose in new camera view.
  2. (Coming soon) In the Shifted RoPE part, we also shifted the RoPE along the frame dimension. Because we find that share the first frame RoPE will sometimes introduce slight artifacts at the first frame.

Quick Start

1. Setup Repository and Environment

git clone https://github.com/theFoxofSky/RealisDance.git
cd RealisDance

conda create -n realisdance python=3.10
conda activate realisdance

pip install -r requirements.txt

# FA3 (Optional)
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout ea3ecea97a1393c092863330aff9a162bb5ce443  # very important, using other FA3 will yield bad results
cd hopper
python setup.py install

2. Download Checkpoints

Please download the checkpoints from the huggingface repo to './pretrained_models'. Please make sure the structure of './pretrained_models' is consistent with the one in the huggingface repo.

./pretrained_models
|---image_encoder/
    |---...
|---scheduler/
    |---...
|---text_encoder/
    |---...
|---tokenizer/
    |---...
|---transformer/
    |---...
|---vae/
    |---...
|---model_index.json

3. Quick Inference

  • Inference with Demo sequences
python inference.py \
    --ref __assets__/demo/ref.png \
    --smpl __assets__/demo/smpl.mp4 \
    --hamer __assets__/demo/hamer.mp4 \
    --prompt "A blonde girl is doing somersaults on the grass. Behind the grass is a river, \
    and behind the river are trees and mountains. The girl is wearing black yoga pants and a black sports vest." \
    --save-dir ./output
  • Inference with TeaCache for acceleration (Optional, may cause quality degradation)
python inference.py \
    --ref __assets__/demo/ref.png \
    --smpl __assets__/demo/smpl.mp4 \
    --hamer __assets__/demo/hamer.mp4 \
    --prompt "A blonde girl is doing somersaults on the grass. Behind the grass is a river, \
    and behind the river are trees and mountains. The girl is wearing black yoga pants and a black sports vest." \
    --save-dir ./output \
    --enable-teacache
  • Inference with small GPU memory (Optional, will be super slow. Can be used with TeaCache)
python inference.py \
    --ref __assets__/demo/ref.png \
    --smpl __assets__/demo/smpl.mp4 \
    --hamer __assets__/demo/hamer.mp4 \
    --prompt "A blonde girl is doing somersaults on the grass. Behind the grass is a river, \
    and behind the river are trees and mountains. The girl is wearing black yoga pants and a black sports vest." \
    --save-dir ./output \
    --save-gpu-memory
  • Inference with multi GPUs (Optional. Can be used with TeaCache)
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 inference.py \
    --ref __assets__/demo/ref.png \
    --smpl __assets__/demo/smpl.mp4 \
    --hamer __assets__/demo/hamer.mp4 \
    --prompt "A blonde girl is doing somersaults on the grass. Behind the grass is a river, \
    and behind the river are trees and mountains. The girl is wearing black yoga pants and a black sports vest." \
    --save-dir ./output \
    --multi-gpu

4. Custom Batch Inference

  • Prepare your reference images and conditions (coming soon)
TODO
  • The root dir should be structured as:
root/
|---ref/
    |---1.png
    |---2.png
    |---...
|---smpl/
    |---1.mp4
    |---2.mp4
    |---...
|---hamer/
    |---1.mp4
    |---2.mp4
    |---...
|---prompt/
    |---1.txt
    |---2.txt
    |---...
  • Batch inference
python inference.py --save-dir ./output --root $PATH-TO-ROOT-DIR

Disclaimer

This project is released for academic use. We disclaim responsibility for user-generated content.

Contact Us

Jingkai Zhou: fs.jingkaizhou@gmail.com

BibTeX

@article{zhou2025realisdance-dit,
  title={RealisDance-DiT: Simple yet Strong Baseline towards Controllable Character Animation in the Wild},
  author={Zhou, Jingkai and Wu, Yifan and Li, Shikai and Wei, Min and Fan, Chao and Chen, Weihua and Jiang, Wei and Wang, Fan},
  journal={arXiv preprint arXiv:2504.14977},
  year={2025}
}

@article{zhou2024realisdance,
  title={RealisDance: Equip controllable character animation with realistic hands},
  author={Zhou, Jingkai and Wang, Benzhi and Chen, Weihua and Bai, Jingqi and Li, Dongyang and Zhang, Aixi and Xu, Hao and Yang, Mingyang and Wang, Fan},
  journal={arXiv preprint arXiv:2409.06202},
  year={2024}
}

Acknowledgements

Thanks Shikai Li for condition paraperation and Chenjie Cao for pose align.

About

The official implementation of RealisDance

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages