Skip to content

Image degradation/artifacts when scaling SDXL latents #1116

@city96

Description

@city96

I figured this is the easiest place to open this issue, but I can probably reproduce it on the reference implementation (or diffusers) and post my issue there, if required.

I've been working on building an interposer to convert the latents generated by v1.X and v2.X models into the latent space that SDXL models use. While training, I noticed that XL-to-v1.5 conversion worked almost perfectly, while v1.5->XL conversion produced nasty digital artifacts. This also resulted in the NN never actually converging properly[1].
After some digging, I found out these same artifacts appear any time the SDXL latent is changed in some way between the encode and decode stages. The simplest example for this is up- or downscaling the latent by any amount. Downscaling a v1.5 latent produces a blurry image (as expected of bilinear scaling)[2]. Downscaling an XL latent produces weird corruptions that almost look like digital artifacts[2]. The effect is even worse when using bislerp. SBS output comparison for 768->512 downscale.

My current hypothesis is that the SDXL VAE is over-trained in some way. It seems a lot less capable of compensating for the worsened signal-to-noise ratio caused by scaling - or in my case converting - the latents. This might also explain why the v1.0 VAE had odd "scanline" issues.

As for fixing it, I have no clue - unless I overlooked something. Maybe @comfyanonymous can forward this to someone at SAI.
I guess in the meantime I'll see if I can train an XL VAE from scratch by pinning the encoder to the current one.


[1] - Interposer training outputs. v1->xl performed worse on the evaluation, despite all training runs sharing the same preprocessed latents as the inputs/targets.

Left graph is eval. loss, right two slides are training loss.
INTERPOSER_RES_LOSS

[2] "Digital noise" from scaling the latent. Present on both v0.9 and v1.0 SDXL VAE but absent from the v1.5 VAE.

LATENT_NOISE_FROM_SCALING2

Metadata

Metadata

Assignees

No one assigned

    Labels

    User SupportA user needs help with something, probably not a bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions