Skip to content

semantic encoder #5

@milely

Description

@milely

Thank you very much for your excellent open-source work!
In my testing, I've experimented with using SigLIP to encode the low-level features of a reference image. I found that this indeed improved editing capabilities, but I also observed that some details of the original image were altered, such as facial identity information (ID).
I've noticed that a number of existing research works combine SigLIP and a VAE to encode image features. This has led me to a couple of questions for the author(s):

  1. In your view, is this combined (SigLIP + VAE) approach a feasible direction for your project?
  2. Alternatively, if your team has already explored this approach, were there other issues or challenges encountered that ultimately led to not adopting it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions