-
Notifications
You must be signed in to change notification settings - Fork 1k
Image2image #73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image2image #73
Conversation
When using --image on swift CLI (macOS 13.1) I'm getting "Error: startingImageProvidedWithoutEncoder". Running the patched code without --image runs fine and produces similar output results as the original/non-patched code. Any suggestions where to start to troubleshoot? |
You would need to run the python script to generate the Encoder model. |
in my swift test app I'm getting an "array out of range" error in file AlphasCumprodCalculation.swift line 25. |
@TheMurusTeam Thanks for catching that! I have not tested with full strength |
Hey @littleowl, thank you for the PR! This is a relatively extensive one so could you please rebase on main and separate this into two PRs: one for the Python component and the other one for the Swift component? |
I did a quick pass and noticed that you are passing noise tensors as inputs due to the randn op missing from coremltools torch frontend. I think it would help simplify the interface and the overall code if the noise tensor was produced inside the model. Would you be willing to try something like the following?
This (or a version of this) should enable automatic conversion of torch.randn op. |
@pcuenca Could you please chime in on whether |
@atiorh I will be happy to split up the PR into as many pieces as desired. The only question I have for the I do plan to follow up with a pipeline supporting in-painting after these code changes go in. @pcuenca I do have the same question. I'm not sure whether the @atiorh - As an aside regarding different issue - along the lines of providing alternatives to PyTorch operations, for the goal of dynamic aspect ratio inputs, I did get past an initial error when trying to support flexible input shapes by doing a similar routine as you described to register an op with |
@atiorh @littleowl I think I'll try to debug what might be going on. |
The problem is that timesteps reversal had been moved to the new function With this change, image2image generation works using (Side note: while debugging this I noticed minor differences in the timesteps for |
@pcuenca Amazing, thank you! @littleowl I recommend that we merge @pcuenca's PR into your fork and verify that the |
any updates on the swift image to image? 👀 such awesome work btw |
Sorry for any delay. I’ve been spending time with family and am just getting back from vacation. I should have the new PRs ready to go in the next day or so. |
and later for in-paining
Encoder
woohoo. got it working! I am an iOS dev for a long time, but very much an ML noob. Couple things I stumbled through:
But I have it working now! This is amazing...can't wait to see what 2023 brings if Apple is open sourcing this now! |
It crashes on my real device (iphone xs max) because the encoder runs out of memory at around 2gb, even though I gave extended adressing and increased memory capabilities. Has anyone got it to run on an actual iOS device as opposed to the simulator? |
@EthanSK I don’t think you will get the framework to work on your iPhone XS since minimum system requirements say you’ll need at least a iPhone 12 Pro with 6GB+ RAM. |
Just wanted to pop back in here and say how much fun this PR has enabled for me. I stumble through python, but feel very comfortable with swift, so this is great! First thing I did was modify the sample so that you can use the output as the input... 🤯 Hope it gets merged soon. |
Ah. so the issue is using v2.1 with neural engine enabled (on actual device) would be able to get a compiled split_ensum with the VAE encoder in it for version 2 or earlier pls? @pcuenca I'm having difficulty generating them locally ("Error computing NN outputs",) also, i'm confused how to go from .mlpackage to the separated .mlmodelc files? |
Speaking of which; does anyone know of required device settings that will allow it to only run on supported devices? For example, iphone-ipad-minimum-performance-a12 is the closest I can find, but it will allow it to run on A12 devices which includes the XS model. |
I have the same problem, By setting up |
The issue is much larger than getting rejected. It means anyone can purchase and install it on unsupported devices which will almost certainly get you 1 star reviews and amounts to a form of theft. Personally I've lost all interest in Stable Diffusion for now. Now that they are being sued, seems like any products based on it are a liability and I can't afford the risk, not to mention on moral concerns. It makes earning a living on the App Store even harder because I have to compete against apps that I can't compete against... In any event, I do wish we could restrict devices to M1 or better. |
Definitely, bad reviews are not conducive to the development of the product. |
@littleowl Just checking in after the holidays, please let us know if you are blocked on anything 🙏 |
I'll take a look :) |
… into image2image
Sorry @atiorh and everyone for the delay. Thanks for keeping the conversation though. Busy lately. Ultimately, while trying to override the The good news, that it is easy to get in-painting to work after these changes. |
Adds image2image functionality.
In Python, a new CoreML model can be generated to encode the latent space for image2image. The model bakes in some of the operations typically performed in the pipeline so that a separate model would not need to be created for those operations, now would the CPU be needed to perform the tensor multiplications. Some of the simpler math involving the scheduler's time steps are performed on the cpu and passed into the encoder. The encoder works around
torch.randn
missing operation by passing in nose tensors to apply to the image latent space.In Swift, an Encoder class is created. Various changes to the scheduler, pipeline, and CLI to support input image and strength. CGImage creation from MLShapedArray is moved into it's own file along with the new function to create a MLShapedArray from a CGImage. Image loading and preparation is currently handled / optimized with vImage.
Understandable a desire to maybe use the Image Input type for CoreML / CoreMLTools, however, I chose not to optimize in this way at this at this point because of trouble that I have had getting enumerated input shapes to work with the models and current python script. Please see: #69 and #70.
The new
DPMSolverMultistepScheduler
does not work with image2image, and looking at the Diffusers library documentation, it does not look like it is supported there either, so, it is currently disabled and should throw an error. Though I also made it safe so it will not crash..Thank you for providing this repo.
Do not erase the below when submitting your pull request:
#########