This repository is the official implementation of RealisDance-DiT. RealisDance-DiT is a structurally simple, empirically robust, and experimentally strong baseline model for controllable character animation in the wild.
- 2025-05-19: Released the inference code and weights of RealisDance-DiT.
- 2025-05-19: You may also be interested in our Uni3C.
- 2024-10-15: Released RealisDance and the code of pose preparation for RealisDance.
- 2024-09-10: Now you can try more interesting AI video editing in XunGuang.
- 2024-09-09: You may also be interested in our human part repair method RealisHuman.
Here are several character animation generated by RealisDance-DiT. Note that the GIFs shown here have some degree of visual quality degradation. Please visit our project page for more original videos
- Inference code
- Model checkpoints
- RealisDance-Val dataset
- TeaCache speedup
- FSDP + Sequential parallel
- Pose paration code
- SMPL retargeting
- New checkpoints with shifted RoPE on frame
Note: This released project has two slight differences between the paper.
- We only use SMPL-CS and HaMer in this version in order to support Uni3C. Because it is difficult to align and render DWPose in new camera view.
- (Coming soon) In the Shifted RoPE part, we also shifted the RoPE along the frame dimension. Because we find that share the first frame RoPE will sometimes introduce slight artifacts at the first frame.
git clone https://github.com/theFoxofSky/RealisDance.git
cd RealisDance
conda create -n realisdance python=3.10
conda activate realisdance
pip install -r requirements.txt
# FA3 (Optional)
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout ea3ecea97a1393c092863330aff9a162bb5ce443 # very important, using other FA3 will yield bad results
cd hopper
python setup.py install
Please download the checkpoints from the huggingface repo to './pretrained_models'. Please make sure the structure of './pretrained_models' is consistent with the one in the huggingface repo.
./pretrained_models
|---image_encoder/
|---...
|---scheduler/
|---...
|---text_encoder/
|---...
|---tokenizer/
|---...
|---transformer/
|---...
|---vae/
|---...
|---model_index.json
- Inference with Demo sequences
python inference.py \
--ref __assets__/demo/ref.png \
--smpl __assets__/demo/smpl.mp4 \
--hamer __assets__/demo/hamer.mp4 \
--prompt "A blonde girl is doing somersaults on the grass. Behind the grass is a river, \
and behind the river are trees and mountains. The girl is wearing black yoga pants and a black sports vest." \
--save-dir ./output
- Inference with TeaCache for acceleration (Optional, may cause quality degradation)
python inference.py \
--ref __assets__/demo/ref.png \
--smpl __assets__/demo/smpl.mp4 \
--hamer __assets__/demo/hamer.mp4 \
--prompt "A blonde girl is doing somersaults on the grass. Behind the grass is a river, \
and behind the river are trees and mountains. The girl is wearing black yoga pants and a black sports vest." \
--save-dir ./output \
--enable-teacache
- Inference with small GPU memory (Optional, will be super slow. Can be used with TeaCache)
python inference.py \
--ref __assets__/demo/ref.png \
--smpl __assets__/demo/smpl.mp4 \
--hamer __assets__/demo/hamer.mp4 \
--prompt "A blonde girl is doing somersaults on the grass. Behind the grass is a river, \
and behind the river are trees and mountains. The girl is wearing black yoga pants and a black sports vest." \
--save-dir ./output \
--save-gpu-memory
- Inference with multi GPUs (Optional. Can be used with TeaCache)
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 inference.py \
--ref __assets__/demo/ref.png \
--smpl __assets__/demo/smpl.mp4 \
--hamer __assets__/demo/hamer.mp4 \
--prompt "A blonde girl is doing somersaults on the grass. Behind the grass is a river, \
and behind the river are trees and mountains. The girl is wearing black yoga pants and a black sports vest." \
--save-dir ./output \
--multi-gpu
- Prepare your reference images and conditions (coming soon)
TODO
- The root dir should be structured as:
root/
|---ref/
|---1.png
|---2.png
|---...
|---smpl/
|---1.mp4
|---2.mp4
|---...
|---hamer/
|---1.mp4
|---2.mp4
|---...
|---prompt/
|---1.txt
|---2.txt
|---...
- Batch inference
python inference.py --save-dir ./output --root $PATH-TO-ROOT-DIR
This project is released for academic use. We disclaim responsibility for user-generated content.
Jingkai Zhou: fs.jingkaizhou@gmail.com
@article{zhou2025realisdance-dit,
title={RealisDance-DiT: Simple yet Strong Baseline towards Controllable Character Animation in the Wild},
author={Zhou, Jingkai and Wu, Yifan and Li, Shikai and Wei, Min and Fan, Chao and Chen, Weihua and Jiang, Wei and Wang, Fan},
journal={arXiv preprint arXiv:2504.14977},
year={2025}
}
@article{zhou2024realisdance,
title={RealisDance: Equip controllable character animation with realistic hands},
author={Zhou, Jingkai and Wang, Benzhi and Chen, Weihua and Bai, Jingqi and Li, Dongyang and Zhang, Aixi and Xu, Hao and Yang, Mingyang and Wang, Fan},
journal={arXiv preprint arXiv:2409.06202},
year={2024}
}
Thanks Shikai Li for condition paraperation and Chenjie Cao for pose align.