Image2image #73

littleowl · 2022-12-18T17:31:44Z

Adds image2image functionality.

In Python, a new CoreML model can be generated to encode the latent space for image2image. The model bakes in some of the operations typically performed in the pipeline so that a separate model would not need to be created for those operations, now would the CPU be needed to perform the tensor multiplications. Some of the simpler math involving the scheduler's time steps are performed on the cpu and passed into the encoder. The encoder works around torch.randn missing operation by passing in nose tensors to apply to the image latent space.

In Swift, an Encoder class is created. Various changes to the scheduler, pipeline, and CLI to support input image and strength. CGImage creation from MLShapedArray is moved into it's own file along with the new function to create a MLShapedArray from a CGImage. Image loading and preparation is currently handled / optimized with vImage.

Understandable a desire to maybe use the Image Input type for CoreML / CoreMLTools, however, I chose not to optimize in this way at this at this point because of trouble that I have had getting enumerated input shapes to work with the models and current python script. Please see: #69 and #70.

The new DPMSolverMultistepScheduler does not work with image2image, and looking at the Diffusers library documentation, it does not look like it is supported there either, so, it is currently disabled and should throw an error. Though I also made it safe so it will not crash..

Thank you for providing this repo.

Do not erase the below when submitting your pull request:
#########

I agree to the terms outlined in CONTRIBUTING.md

jsdomingue · 2022-12-19T07:13:16Z

When using --image on swift CLI (macOS 13.1) I'm getting "Error: startingImageProvidedWithoutEncoder". Running the patched code without --image runs fine and produces similar output results as the original/non-patched code. Any suggestions where to start to troubleshoot?

littleowl · 2022-12-19T07:28:05Z

You would need to run the python script to generate the Encoder model.
for instance: python -m python_coreml_stable_diffusion.torch2coreml --model-version ../stable-diffusion-2-base --convert-vae-encoder --bundle-resources-for-swift-cli --check-output-correctness --attention-implementation ORIGINAL -o ../Generated/CoreML/StableDiffusion2-base/ORIGINAL
I haven't published any generated coreml models on hugging face, maybe I may do that at some point.

TheMurusTeam · 2022-12-19T16:58:32Z

in my swift test app I'm getting an "array out of range" error in file AlphasCumprodCalculation.swift line 25.
It tries to subscribe item 1002 of an array containing 1000 items. Changing this line to
let initTimestep = timesteps - timesteps / steps * (steps - tEnc) - 1
seems to fix the issue. Am I missing something?

littleowl · 2022-12-19T21:37:28Z

@TheMurusTeam Thanks for catching that! I have not tested with full strength 1.0 since altering that part of the code. strength of 1.0 kinda defeats the purpose, maybe it should not be the default.
Though, your change will fail if strength is 0.0. I pushed a change that will clamp this result so it's safe.

atiorh · 2022-12-27T03:49:06Z

Hey @littleowl, thank you for the PR! This is a relatively extensive one so could you please rebase on main and separate this into two PRs: one for the Python component and the other one for the Swift component?

atiorh · 2022-12-27T03:53:40Z

I did a quick pass and noticed that you are passing noise tensors as inputs due to the randn op missing from coremltools torch frontend. I think it would help simplify the interface and the overall code if the noise tensor was produced inside the model. Would you be willing to try something like the following?

from coremltools.converters.mil.frontend.torch.torch_op_registry import register_torch_op
from coremltools.converters.mil.frontend.torch.ops import _get_inputs
from coremltools.converters.mil import Builder as mb

@register_torch_op
def randn(context, node):
    inputs = _get_inputs(context, node, expected=5)
    shape = inputs[0]
    
    x = mb.random_normal(shape=shape, mean=0., stddev=1.)
    context.add(x, node.name)

This (or a version of this) should enable automatic conversion of torch.randn op.

atiorh · 2022-12-27T04:17:14Z

@pcuenca Could you please chime in on whether DPMSolverMultistepScheduler should or should not be supported by image2image? 🙏

littleowl · 2022-12-27T12:49:00Z

@atiorh I will be happy to split up the PR into as many pieces as desired. The only question I have for the randn override function is providing the seed. I'll try to figure that out, but if that is top of mind to you, it would be beneficial. I will also amend the readme for instructions on generating the encoder model.

I do plan to follow up with a pipeline supporting in-painting after these code changes go in.

@pcuenca I do have the same question. I'm not sure whether the DPMSolverMultistepScheduler should/will support imageToImage or not. Initially, I did notice that this scheduler was not among the supported schedulers in the diffusion library for this, but then when looking again, I saw a reference where maybe it might be. Nonetheless, I did try to make it support - with some interesting image artifacts - but did not find a solution. If it should be supported, I could use your help or insight to make it so.

@atiorh - As an aside regarding different issue - along the lines of providing alternatives to PyTorch operations, for the goal of dynamic aspect ratio inputs, I did get past an initial error when trying to support flexible input shapes by doing a similar routine as you described to register an op with group_norm only be confronted with another operation that was missing and I could not find a work around for it. (I forget exactly what it was, but seems like it was member-wsie addition or similar ). [#70] Also, I'm not sure if there are more complications after that one, and whether the MLProgram (vs neural network) will work with the dynamic shapes with this model or not. What is the best place to seek help in regards to this? coremltools repo or the developer forums?

pcuenca · 2022-12-28T11:35:34Z

@atiorh @littleowl I think DPMSolverMultistepScheduler should also work. I just double-checked in diffusers and it did.

I'll try to debug what might be going on.

pcuenca · 2022-12-28T19:30:44Z

The problem is that timesteps reversal had been moved to the new function calculateTimesteps, but only for the PNDM scheduler. See this PR for a proposed fix.

With this change, image2image generation works using DPMSolverMultistepScheduler.

(Side note: while debugging this I noticed minor differences in the timesteps for DPMSolverMultistepScheduler with respect to the reference Python implementation. I'll open a separate PR about that).

atiorh · 2022-12-29T05:54:35Z

@pcuenca Amazing, thank you!

@littleowl I recommend that we merge @pcuenca's PR into your fork and verify that the DPMSolverMultistepScheduler is working as expected including the minor timestep differences Pedro mentioned above. In the meantime, feel free to submit the Python-only PR so we can start iterating on that. Regarding your question on setting the seed for the Core ML based randn op, I will look into that and get back to you. Regarding your other question about dynamic aspect ratios and flexible shapes, please create an issue on the coremltools repo with a minimal repro (and a reference to the on this repo) and cc me so I can also help move things along. 🙏

EthanSK · 2023-01-02T17:25:33Z

any updates on the swift image to image? 👀

such awesome work btw

littleowl · 2023-01-02T17:44:32Z

Sorry for any delay. I’ve been spending time with family and am just getting back from vacation. I should have the new PRs ready to go in the next day or so.

and later for in-paining

Encoder

pj4533 · 2023-01-08T02:26:10Z

woohoo. got it working!

I am an iOS dev for a long time, but very much an ML noob. Couple things I stumbled through:

I used the StableDiffusionSample and there was an error with the type of the seed variable (UInt32 vs Int), I just forced it to an Int and it worked fine
I also ran into the encoder model not being compiled. This stumped me for a bit, cause I had ran the python command to compile the resources, but didn't see the specifics of @littleowl 's note above. The important bit is the --convert-vae-encoder (not just the text encoder). Once I added that, it worked fine.
Images have to be 512x512 or it errors out
Pretty quickly ran into 'failed to generate prediction for sample 0' on my MacMini M1. Added the --reduce-memory option and it seems happy again

But I have it working now! This is amazing...can't wait to see what 2023 brings if Apple is open sourcing this now!

EthanSK · 2023-01-10T15:49:31Z

It crashes on my real device (iphone xs max) because the encoder runs out of memory at around 2gb, even though I gave extended adressing and increased memory capabilities. Has anyone got it to run on an actual iOS device as opposed to the simulator?

martinlexow · 2023-01-10T15:52:15Z

@EthanSK I don’t think you will get the framework to work on your iPhone XS since minimum system requirements say you’ll need at least a iPhone 12 Pro with 6GB+ RAM.

pj4533 · 2023-01-10T22:52:46Z

Just wanted to pop back in here and say how much fun this PR has enabled for me. I stumble through python, but feel very comfortable with swift, so this is great! First thing I did was modify the sample so that you can use the output as the input... 🤯

Hope it gets merged soon.

EthanSK · 2023-01-11T23:56:24Z

Ah. so the issue is using v2.1 with neural engine enabled (on actual device)

would be able to get a compiled split_ensum with the VAE encoder in it for version 2 or earlier pls? @pcuenca I'm having difficulty generating them locally ("Error computing NN outputs",)

also, i'm confused how to go from .mlpackage to the separated .mlmodelc files?

3DTOPO · 2023-01-13T23:25:57Z

@EthanSK I don’t think you will get the framework to work on your iPhone XS since minimum system requirements say you’ll need at least a iPhone 12 Pro with 6GB+ RAM.

Speaking of which; does anyone know of required device settings that will allow it to only run on supported devices?

For example, iphone-ipad-minimum-performance-a12 is the closest I can find, but it will allow it to run on A12 devices which includes the XS model.

jiangdi0924 · 2023-01-16T01:54:42Z

@EthanSK I don’t think you will get the framework to work on your iPhone XS since minimum system requirements say you’ll need at least a iPhone 12 Pro with 6GB+ RAM.

Speaking of which; does anyone know of required device settings that will allow it to only run on supported devices?

For example, iphone-ipad-minimum-performance-a12 is the closest I can find, but it will allow it to run on A12 devices which includes the XS model.

I have the same problem， By setting up required device capabilities I found that Only iphone-ipad-minimum-performance-a12 option comes closest to. Apple reviewers may take some non-functioning devices for review, and my app was rejected for this reason when I released a new version

3DTOPO · 2023-01-16T01:57:07Z

I have the same problem， By setting up required device capabilities I found that Only iphone-ipad-minimum-performance-a12 option comes closest to. Apple reviewers may take some non-functioning devices for review, and my app was rejected for this reason when I released a new version

The issue is much larger than getting rejected. It means anyone can purchase and install it on unsupported devices which will almost certainly get you 1 star reviews and amounts to a form of theft.

Personally I've lost all interest in Stable Diffusion for now. Now that they are being sued, seems like any products based on it are a liability and I can't afford the risk, not to mention on moral concerns.

It makes earning a living on the App Store even harder because I have to compete against apps that I can't compete against...

In any event, I do wish we could restrict devices to M1 or better.

jiangdi0924 · 2023-01-16T02:06:32Z

I have the same problem， By setting up required device capabilities I found that Only iphone-ipad-minimum-performance-a12 option comes closest to. Apple reviewers may take some non-functioning devices for review, and my app was rejected for this reason when I released a new version

The issue is much larger than getting rejected. It means anyone can purchase and install it on unsupported devices which will almost certainly get you 1 star reviews and amounts to a form of theft.

Personally I've lost all interest in Stable Diffusion for now. Now that they are being sued, seems like any products based on it are a liability and I can't afford the risk, not to mention on moral concerns.

It makes earning a living on the App Store even harder because I have to compete against apps that I can't compete against...

In any event, I do wish we could restrict devices with Apple Silicon.

Definitely, bad reviews are not conducive to the development of the product.

atiorh · 2023-01-24T17:35:03Z

@littleowl Just checking in after the holidays, please let us know if you are blocked on anything 🙏

pcuenca · 2023-01-26T14:22:32Z

would be able to get a compiled split_ensum with the VAE encoder in it for version 2 or earlier pls? @pcuenca I'm having difficulty generating them locally ("Error computing NN outputs",)

I'll take a look :)

… into image2image

littleowl · 2023-01-28T03:50:57Z

Sorry @atiorh and everyone for the delay. Thanks for keeping the conversation though.
I will close this PR and have opened two more:
Python
Swift

Busy lately. Ultimately, while trying to override the randn function I had realized that I had tried that already when first encountering the issue. I detail the errors in the description of the other PR. I tried to explain in the PR why I don't believe it to be the right thing to do anyways. As, I think it is better to generate the noise in the CPU / Swift Space for flexibility of techniques.

The good news, that it is easy to get in-painting to work after these changes.

This was referenced Dec 18, 2022

Feature request: support image inputs #27

Open

Image to Image? #44

Open

TheMurusTeam mentioned this pull request Dec 30, 2022

Requests and Bugs TheMurusTeam/PromptToImage#5

Open

littleowl and others added 11 commits January 5, 2023 14:17

Add Encoder model to torch2coreml for image2image

f9542b7

and later for in-paining

Image2Image Encoder

41d4637

Encoder

Scheduler and pipeline

effc166

fix scheduler

0a1b2b8

cli

cacf99d

remove CLI comment

ac158dc

disable dpm multistep solver with image2image

685da66

clamp initial timestamp

3c30af5

Store timesteps in reverse order for consistency.

d9563fb

Report actual number of steps.

f816ed8

diagonal test with randn

ccd2832

littleowl force-pushed the image2image branch from 3c66e50 to ccd2832 Compare January 5, 2023 22:21

uint32

9315861

EthanSK mentioned this pull request Jan 25, 2023

Downloading doesn't work in production? huggingface/swift-coreml-diffusers#13

Closed

Merge branch 'main' of https://github.com/littleowl/ml-stable-diffusion…

eb7e472

… into image2image

This was referenced Jan 28, 2023

Image2image - python #115

Merged

Image2image - swift #116

Merged

littleowl closed this Jan 28, 2023

Image2image #73

Image2image #73

Uh oh!

Conversation

littleowl commented Dec 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jsdomingue commented Dec 19, 2022

Uh oh!

littleowl commented Dec 19, 2022

Uh oh!

TheMurusTeam commented Dec 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

littleowl commented Dec 19, 2022

Uh oh!

atiorh commented Dec 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atiorh commented Dec 27, 2022

Uh oh!

atiorh commented Dec 27, 2022

Uh oh!

littleowl commented Dec 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcuenca commented Dec 28, 2022

Uh oh!

pcuenca commented Dec 28, 2022

Uh oh!

atiorh commented Dec 29, 2022

Uh oh!

EthanSK commented Jan 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

littleowl commented Jan 2, 2023

Uh oh!

pj4533 commented Jan 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EthanSK commented Jan 10, 2023

Uh oh!

martinlexow commented Jan 10, 2023

Uh oh!

pj4533 commented Jan 10, 2023

Uh oh!

EthanSK commented Jan 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

3DTOPO commented Jan 13, 2023

Uh oh!

jiangdi0924 commented Jan 16, 2023

Uh oh!

3DTOPO commented Jan 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiangdi0924 commented Jan 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atiorh commented Jan 24, 2023

Uh oh!

pcuenca commented Jan 26, 2023

Uh oh!

littleowl commented Jan 28, 2023

Uh oh!

Uh oh!

littleowl commented Dec 18, 2022 •

edited

Loading

TheMurusTeam commented Dec 19, 2022 •

edited

Loading

atiorh commented Dec 27, 2022 •

edited

Loading

littleowl commented Dec 27, 2022 •

edited

Loading

EthanSK commented Jan 2, 2023 •

edited

Loading

pj4533 commented Jan 8, 2023 •

edited

Loading

EthanSK commented Jan 11, 2023 •

edited

Loading

3DTOPO commented Jan 16, 2023 •

edited

Loading

jiangdi0924 commented Jan 16, 2023 •

edited

Loading