🥮 Tiny AutoEncoder for Hunyuan Video

What is TAEHV?

TAEHV is a Tiny AutoEncoder for Hunyuan Video (and other similar video models). TAEHV can encode and decode latents into videos more cheaply (in time & memory) than the full-size video VAEs, at the cost of slightly lower quality.

Here's a comparison of the output & memory usage of the Full Hunyuan VAE vs. TAEHV during decoding:

Decoder AE	Full Hunyuan VAE	TAEHV
Decoded Video ^{(converted to GIF)}
Runtime ^{(in fp16, on GH200)}	~2-3s for decoding 61 frames of (512, 320) video	~0.5s for decoding 61 frames of (512, 320) video. Can be even faster with the right settings
Memory ^{(in fp16, on GH200)}	~6-9GB Peak Memory Usage	<0.5GB Peak Memory Usage

See the profiling notebook for details on this comparison or the example notebook for a simpler demo.

What video models does TAEHV support?

To use TAEHV with different video models, you can load the different .pth (model weight) files from this repo:

For Wan 2.1, load the taew2_1.pth weights (see the Wan 2.1 example notebook).
For Wan 2.2, load different .pth files depending on model scale:
- For Wan 2.2 5B, load the taew2_2.pth weights (example notebook).
- For Wan 2.2 14B, load the taew2_1.pth weights since Wan 2.2 14B still uses the older Wan 2.1 VAE.
For CogVideoX, load the taecvx.pth weights (example notebook).
For Hunyuan Video, load the taehv.pth weights (example notebook).
For Open-Sora 1.3, load the taeos1_3.pth weights.
For Mochi 1 and SVD (which use different architectures), see the other repos TAEM1 and TAESDV.

If there's another open video model that would benefit from a TAEHV version, please file an issue (or, worst-case, try training your own).

Where can I get TAEHV?

TAEHV is available:

In ComfyUI
- Via the ComfyUI-WanVideoWrapper + VideoHelperSuite nodes thanks to Kijai and AustinMroz
- Via the ComfyUI-Bleh nodes thanks to blepping
In SDNext thanks to vladmandic
In the Wan2.1 Self-Forcing demo thanks to Guande He and Xun Huang

If you've added TAEHV support elsewhere, LMK and I can add a link here.

How do I use TAEHV with 🧨 Diffusers?

You can use TAEHV with Diffusers by applying a small bit of wrapper code (example notebook). If you're writing new code involving both TAEHV and Diffusers, keep the following conventions in mind:

TAEHV stores image values in the range [0, 1], whereas Diffusers uses [-1, 1].
TAEHV stores videos in NTCHW dimension order (time, then channels), while Diffusers stores videos in NCTHW dimension order.
TAEHV does not use any latent scales / shifts (TAEHV encodes / decodes exactly what diffusion models use), whereas Diffusers requires explicitly applying a latents_mean and latents_std each time you encode or decode something.

How can I make TAEHV even faster?

You can disable TAEHV's temporal or spatial upscaling to get even-cheaper decoding.

TAEHV(decoder_time_upscale=(False, False), decoder_space_upscale=(True, True, True))

TAEHV(decoder_time_upscale=(False, False), decoder_space_upscale=(False, False, False))

If you have a powerful GPU or are decoding at a reduced resolution, you can also set parallel=True in TAEHV.decode_video to decode all frames at once (which is faster but requires more memory).

TAEHV is fully causal (with finite receptive field) so it's structurally possible to display TAEHV output "realtime" (the instant each frame is decoded) rather than waiting for the sequence to complete.

Limitations

TAEHV is generally very tiny compared to the full VAEs it's mimicking, so TAEHV output is usually blurrier/more-artifacty than the full VAEs. I also don't have great quality benchmarks yet (should I be checking FVD? JEDi? idk) so I'm mostly relying on visual spot-checks. Please report any quality issues as you discover them (ideally with test latents or videos so I can reproduce & debug).

How can I cite TAEHV in a publication?

If you find TAEHV useful in your research, you can cite the TAEHV repo as a web link:

@misc {BoerBohan2025TAEHV,
  author = {Boer Bohan, Ollin},
  title = {TAEHV: Tiny AutoEncoder for Hunyuan Video},
  year = {2025},
  howpublished = {\url{https://github.com/madebyollin/taehv}},
}

The TAEHV repo contents change over time, so I recommend also noting the latest commit hash and access date in a note field, e.g.

note = {Commit: \texttt{5ce7381}, Accessed: 2025-09-05}

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
taecvx.pth		taecvx.pth
taehv.pth		taehv.pth
taehv.py		taehv.py
taeos1_3.pth		taeos1_3.pth
taew2_1.pth		taew2_1.pth
taew2_2.pth		taew2_2.pth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🥮 Tiny AutoEncoder for Hunyuan Video

What is TAEHV?

What video models does TAEHV support?

Where can I get TAEHV?

How do I use TAEHV with 🧨 Diffusers?

How can I make TAEHV even faster?

Limitations

How can I cite TAEHV in a publication?

About

Uh oh!

Languages

License

madebyollin/taehv

Folders and files

Latest commit

History

Repository files navigation

🥮 Tiny AutoEncoder for Hunyuan Video

What is TAEHV?

What video models does TAEHV support?

Where can I get TAEHV?

How do I use TAEHV with 🧨 Diffusers?

How can I make TAEHV even faster?

Limitations

How can I cite TAEHV in a publication?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages