Skip to content

madebyollin/taehv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🥮 Tiny AutoEncoder for Hunyuan Video

What is TAEHV?

TAEHV is a Tiny AutoEncoder for Hunyuan Video (and other similar video models). TAEHV can encode and decode latents into videos more cheaply (in time & memory) than the full-size video VAEs, at the cost of slightly lower quality.

Here's a comparison of the output & memory usage of the Full Hunyuan VAE vs. TAEHV during decoding:

Decoder AEFull Hunyuan VAETAEHV
Decoded Video
(converted to GIF)
Runtime
(in fp16, on GH200)
~2-3s for decoding 61 frames of (512, 320) video ~0.5s for decoding 61 frames of (512, 320) video.
Can be even faster with the right settings
Memory
(in fp16, on GH200)
~6-9GB Peak Memory Usage
<0.5GB Peak Memory Usage

See the profiling notebook for details on this comparison or the example notebook for a simpler demo.

What video models does TAEHV support?

To use TAEHV with different video models, you can load the different .pth (model weight) files from this repo:

  • For Wan 2.1, load the taew2_1.pth weights (see the Wan 2.1 example notebook).
  • For Wan 2.2, load different .pth files depending on model scale:
  • For CogVideoX, load the taecvx.pth weights (example notebook).
  • For Hunyuan Video, load the taehv.pth weights (example notebook).
  • For Open-Sora 1.3, load the taeos1_3.pth weights.
  • For Mochi 1 and SVD (which use different architectures), see the other repos TAEM1 and TAESDV.

If there's another open video model that would benefit from a TAEHV version, please file an issue (or, worst-case, try training your own).

Where can I get TAEHV?

TAEHV is available:

If you've added TAEHV support elsewhere, LMK and I can add a link here.

How do I use TAEHV with 🧨 Diffusers?

You can use TAEHV with Diffusers by applying a small bit of wrapper code (example notebook). If you're writing new code involving both TAEHV and Diffusers, keep the following conventions in mind:

  1. TAEHV stores image values in the range [0, 1], whereas Diffusers uses [-1, 1].
  2. TAEHV stores videos in NTCHW dimension order (time, then channels), while Diffusers stores videos in NCTHW dimension order.
  3. TAEHV does not use any latent scales / shifts (TAEHV encodes / decodes exactly what diffusion models use), whereas Diffusers requires explicitly applying a latents_mean and latents_std each time you encode or decode something.

How can I make TAEHV even faster?

You can disable TAEHV's temporal or spatial upscaling to get even-cheaper decoding.

TAEHV(decoder_time_upscale=(False, False), decoder_space_upscale=(True, True, True))

Image

TAEHV(decoder_time_upscale=(False, False), decoder_space_upscale=(False, False, False))

Image

If you have a powerful GPU or are decoding at a reduced resolution, you can also set parallel=True in TAEHV.decode_video to decode all frames at once (which is faster but requires more memory).

TAEHV is fully causal (with finite receptive field) so it's structurally possible to display TAEHV output "realtime" (the instant each frame is decoded) rather than waiting for the sequence to complete.

Limitations

TAEHV is generally very tiny compared to the full VAEs it's mimicking, so TAEHV output is usually blurrier/more-artifacty than the full VAEs. I also don't have great quality benchmarks yet (should I be checking FVD? JEDi? idk) so I'm mostly relying on visual spot-checks. Please report any quality issues as you discover them (ideally with test latents or videos so I can reproduce & debug).

How can I cite TAEHV in a publication?

If you find TAEHV useful in your research, you can cite the TAEHV repo as a web link:

@misc {BoerBohan2025TAEHV,
  author = {Boer Bohan, Ollin},
  title = {TAEHV: Tiny AutoEncoder for Hunyuan Video},
  year = {2025},
  howpublished = {\url{https://github.com/madebyollin/taehv}},
}

The TAEHV repo contents change over time, so I recommend also noting the latest commit hash and access date in a note field, e.g.

note = {Commit: \texttt{5ce7381}, Accessed: 2025-09-05}

About

Tiny AutoEncoder for Hunyuan Video (and other video models)

Resources

License

Stars

Watchers

Forks

Languages