Unified multimodal models for image understanding, generation, and editing
Welcome to the Skywork-UniPic repository!
This repository hosts the model weights and official implementations of our unified multimodal models, featuring two distinct modeling paradigms:
- UniPic-1.0 — 1.5B parameters, Unified Autoregressive Modeling for joint visual understanding and generation, enabling a single transformer to handle both perception and synthesis tasks.
- UniPic-2.0 Series — SD3.5M-Kontext and MetaQuery variants based on Efficient Architectures with Diffusion Post-Training, delivering state-of-the-art performance in text-to-image generation, fine-grained image editing, and multimodal reasoning.
Date | Update |
---|---|
2025-08-13 | Released UniPic-2 — Unified Model Weights with Diffusion-based Post-Training |
2025-07-30 | Released UniPic-1 — Autoregressive unified modeling from scratch |
- 🎨 Text-to-Image Generation — High-fidelity synthesis from natural language prompts.
- 🛠 Image Editing — Seamless inpainting, outpainting, and object manipulation.
- 🖼 Image Understanding — Robust perception capabilities for various visual tasks.
- ⚡ Efficient Architecture — Optimized for both accuracy and deployability.
This project is licensed under the MIT License — see the LICENSE file for details.