-
Notifications
You must be signed in to change notification settings - Fork 9.7k
Implement EasyCache and Invent LazyCache #9496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…x as test; I screwed up the math a bit, but when I set it just right it works.
… EasyCacheHolder instead of a dict wrapped by object
…ible with a greater amount of models, make end_percent work
…oved outside the scope of conds, added PREDICT_NOISE wrapper to facilitate this test
… does not work well at all
Is it like TeaCache? |
@yamatazen Yep, I just updated the PR's info with a lot more details! I published it in the middle of the night so I didn't have the energy to fill it out at that point haha |
Thank you so much, looking forward to testing this with Wan 2.2 5B. Easycache is the best option for it right now but it requires a non-native generation workflow. This will let me implement it AND refactor :) |
Would this work with TeaCache ? But I guess the quality drop would be huge. Also, does also work for image diffusion models ?? (Looking at your examples) Thank you! |
* Attempting a universal implementation of EasyCache, starting with flux as test; I screwed up the math a bit, but when I set it just right it works. * Fixed math to make threshold work as expected, refactored code to use EasyCacheHolder instead of a dict wrapped by object * Use sigmas from transformer_options instead of timesteps to be compatible with a greater amount of models, make end_percent work * Make log statement when not skipping useful, preparing for per-cond caching * Added DIFFUSION_MODEL wrapper around forward function for wan model * Add subsampling for heuristic inputs * Add subsampling to output_prev (output_prev_subsampled now) * Properly consider conds in EasyCache logic * Created SuperEasyCache to test what happens if caching and reuse is moved outside the scope of conds, added PREDICT_NOISE wrapper to facilitate this test * Change max reuse_threshold to 3.0 * Mark EasyCache/SuperEasyCache as experimental (beta) * Make Lumina2 compatible with EasyCache * Add EasyCache support for Qwen Image * Fix missing comma, curse you Cursor * Add EasyCache support to AceStep * Add EasyCache support to Chroma * Added EasyCache support to Cosmos Predict t2i * Make EasyCache not crash with Cosmos Predict ImagToVideo latents, but does not work well at all * Add EasyCache support to hidream * Added EasyCache support to hunyuan video * Added EasyCache support to hunyuan3d * Added EasyCache support to LTXV (not very good, but does not crash) * Implemented EasyCache for aura_flow * Renamed SuperEasyCache to LazyCache, hardcoded subsample_factor to 8 on nodes * Eatra logging when verbose is true for EasyCache
Implemented EasyCache natively for (almost) all supported models in ComfyUI, and created LazyCache for a more generic and universally compatible implementation of the core concepts of EasyCache.
What is it?
EasyCache is in the family of step-skipping tricks like TeaCache that reuses cached values from a previous step whenever a step is determined to be unnecessary to sample. Unlike TeaCache, EasyCache only has 1 hyperparameter:
reuse_threshold
, and can quickly be adjusted based on model needs. It is also 'easy' to implement in that it only cares about the inputs and outputs of the models' forward functions, not anything inside them. I implemented it by making sure (almost) every supported ComfyUI model's forward function has DIFFUSION_MODEL wrapper compatibility - no internal model code was modified.Each modified model type was tested with an example or template workflow with and without EasyCache enabled to make sure nothing that worked before broke.
Also invented a more universal but generally worse implementation of EasyCache I dub LazyCache - forget caring about caching individual conds surrounding the models' forward function, just deal with the full combined latents before and after each step. Sometimes when one works poorly for a model, the other does better - always favor EasyCache first though. To implement it, I added a PREDICT_NOISE wrapper handled by outer_predict_noise function in CFGGuider. There was no place previously to cleanly wrap around the noising of each sampling step before/after CFG, so this wrapper 'endpoint' can be potentially used for other things.
Usage:
EasyCache should ideally make the composition very similar to the non-EasyCache generation, except with a quality loss.
Here is what each parameter does on the nodes:
reuse_threshold
: value used to decide how liberally steps will be skipped. The higher the value, the more steps will be skipped (higher speedup, lower quality). The lower the value, the less steps will be skipped (lower speedup, higher quality). If you need to finetune the value and want to see what numbers the code sees, toggleverbose
to True. Your console will be spammed by a bunch of log statements for each sampling step.start_percent
: determines what relative point in the sampling steps to begin considering skipping steps. The early steps determine overall composition and are important in maintaining image quality. If you find that the resulting image strays too far from the non-Cached version for a particular model, consider increasingstart_percent
.end_percent
: determines what relative point in the sampling steps to end considering skipping steps. The late steps are responsible for some of the finer details of the image. If you find the finer details lacking, consider decreasingend_percent
.Results:
Flux Kontext w/ 40 steps:
Qwen Image Edit w/ 20 steps:
WAN2.2 14B First Frame w/ 20 steps (10 per sampler):
easycache_00008_.mp4