semantic encoder

Thank you very much for your excellent open-source work!
In my testing, I've experimented with using SigLIP to encode the low-level features of a reference image. I found that this indeed improved editing capabilities, but I also observed that some details of the original image were altered, such as facial identity information (ID).
I've noticed that a number of existing research works combine SigLIP and a VAE to encode image features. This has led me to a couple of questions for the author(s):
1. In your view, is this combined (SigLIP + VAE) approach a feasible direction for your project?
2. Alternatively, if your team has already explored this approach, were there other issues or challenges encountered that ultimately led to not adopting it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

semantic encoder #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

semantic encoder #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions