TAEHV is a Tiny AutoEncoder for Hunyuan Video (and other similar video models). TAEHV can encode and decode latents into videos more cheaply (in time & memory) than the full-size video VAEs, at the cost of slightly lower quality.
Here's a comparison of the output & memory usage of the Full Hunyuan VAE vs. TAEHV during decoding:
Decoder AE | Full Hunyuan VAE | TAEHV |
---|---|---|
Decoded Video (converted to GIF) |
||
Runtime (in fp16, on GH200) |
~2-3s for decoding 61 frames of (512, 320) video | ~0.5s for decoding 61 frames of (512, 320) video. Can be even faster with the right settings |
Memory (in fp16, on GH200) |
~6-9GB Peak Memory Usage |
<0.5GB Peak Memory Usage |
See the profiling notebook for details on this comparison or the example notebook for a simpler demo.
To use TAEHV with different video models, you can load the different .pth
(model weight) files from this repo:
- For Wan 2.1, load the
taew2_1.pth
weights (see the Wan 2.1 example notebook). - For Wan 2.2, load different
.pth
files depending on model scale:- For Wan 2.2 5B, load the
taew2_2.pth
weights (example notebook). - For Wan 2.2 14B, load the
taew2_1.pth
weights since Wan 2.2 14B still uses the older Wan 2.1 VAE.
- For Wan 2.2 5B, load the
- For CogVideoX, load the
taecvx.pth
weights (example notebook). - For Hunyuan Video, load the
taehv.pth
weights (example notebook). - For Open-Sora 1.3, load the
taeos1_3.pth
weights. - For Mochi 1 and SVD (which use different architectures), see the other repos TAEM1 and TAESDV.
If there's another open video model that would benefit from a TAEHV version, please file an issue (or, worst-case, try training your own).
TAEHV is available:
-
In ComfyUI
-
Via the ComfyUI-WanVideoWrapper + VideoHelperSuite nodes thanks to Kijai and AustinMroz
-
Via the ComfyUI-Bleh nodes thanks to blepping
-
-
In SDNext thanks to vladmandic
-
In the Wan2.1 Self-Forcing demo thanks to Guande He and Xun Huang
If you've added TAEHV support elsewhere, LMK and I can add a link here.
You can use TAEHV with Diffusers by applying a small bit of wrapper code (example notebook). If you're writing new code involving both TAEHV and Diffusers, keep the following conventions in mind:
- TAEHV stores image values in the range [0, 1], whereas Diffusers uses [-1, 1].
- TAEHV stores videos in NTCHW dimension order (time, then channels), while Diffusers stores videos in NCTHW dimension order.
- TAEHV does not use any latent scales / shifts (TAEHV encodes / decodes exactly what diffusion models use), whereas Diffusers requires explicitly applying a
latents_mean
andlatents_std
each time you encode or decode something.
You can disable TAEHV's temporal or spatial upscaling to get even-cheaper decoding.
TAEHV(decoder_time_upscale=(False, False), decoder_space_upscale=(True, True, True))
TAEHV(decoder_time_upscale=(False, False), decoder_space_upscale=(False, False, False))
If you have a powerful GPU or are decoding at a reduced resolution, you can also set parallel=True
in TAEHV.decode_video
to decode all frames at once (which is faster but requires more memory).
TAEHV is fully causal (with finite receptive field) so it's structurally possible to display TAEHV output "realtime" (the instant each frame is decoded) rather than waiting for the sequence to complete.
TAEHV is generally very tiny compared to the full VAEs it's mimicking, so TAEHV output is usually blurrier/more-artifacty than the full VAEs. I also don't have great quality benchmarks yet (should I be checking FVD? JEDi? idk) so I'm mostly relying on visual spot-checks. Please report any quality issues as you discover them (ideally with test latents or videos so I can reproduce & debug).
If you find TAEHV useful in your research, you can cite the TAEHV repo as a web link:
@misc {BoerBohan2025TAEHV,
author = {Boer Bohan, Ollin},
title = {TAEHV: Tiny AutoEncoder for Hunyuan Video},
year = {2025},
howpublished = {\url{https://github.com/madebyollin/taehv}},
}
The TAEHV repo contents change over time, so I recommend also noting the latest commit hash and access date in a note field, e.g.
note = {Commit: \texttt{5ce7381}, Accessed: 2025-09-05}