Yunlong Lin1*, Zixu Lin1*, Kunjie Lin1*, Jinbin Bai5, Panwang Pan4, Chenxin Li3, Haoyu Chen2, Zhongdao Wang6, Xinghao Ding1†, Wenbo Li3♣, Shuicheng Yan5†
1Xiamen University, 2The Hong Kong University of Science and Technology (Guangzhou), 3 The Chinese University of Hong Kong, 4Bytedance, 5National University of Singapore, 6Tsinghua University
- [2025.7.14] 🙏 Thanks to @pydemo for writing a helpful tutorial: Automate Your Lightroom Preset Creation with AI.
- [2025.7.12] 🚀 Inference code is now available! Check out our Inference documentation.
- [2025.7.9] 🙏 We're grateful to @AK for featuring JarvisArt on Twitter!
- [2025.7.4] 📖 See our Chinese blog to get more details about JarvisArt! 中文解读|修图界ChatGPT诞生!JarvisArt:解放人类艺术创造力——用自然语言指挥200+专业工具.
- [2025.7.3] 🤗 Hugging Face online demo is now available: Try it here: JarvisArt-Preview.
- [2025.6.28] 🚀 Gradio demo and model weights are now available! Check out our Gradio Demo and Model Weights.
- [2025.6.20] 📄 Paper is now available on arXiv.
- [2025.6.16] 🌐 Project page is live.
JarvisArt is a multi-modal large language model (MLLM)-driven agent for intelligent photo retouching. It is designed to liberate human creativity by understanding user intent, mimicking the reasoning of professional artists, and coordinating over 200 tools in Adobe Lightroom. JarvisArt utilizes a novel two-stage training framework, starting with Chain-of-Thought supervised fine-tuning for foundational reasoning, followed by Group Relative Policy Optimization for Retouching (GRPO-R) to enhance its decision-making and tool proficiency. Supported by the newly created MMArt dataset (55K samples) and MMArt-Bench, JarvisArt demonstrates superior performance, outperforming GPT-4o with a 60% improvement in pixel-level metrics for content fidelity while maintaining comparable instruction-following capabilities.
Global Retouching Case
Local Retouching Case
JarvisArt supports multi-granularity retouching goals, ranging from scene-level adjustments to region-specific refinements. Users can perform intuitive, free-form edits through natural inputs such as text prompts and bounding boxes
- Create repo and project page
- Release preview Inference code and gradio demo
- Release huggingface online demo
- Release preview model weight
- Release MMArt dataset with open license
- Release training code
For gradio demo running, please follow:
For batch inference, please follow the instructions below:
JarvisIR: An intelligent image restoration agent for diverse and complex degradations in real-world scenarios.
We are excited to expand the Jarvis family with more intelligent agents in the future. Stay tuned for upcoming releases!
We would like to express our gratitude to LLaMA-Factory and gradio_image_annotator for their valuable open-source contributions which have provided important technical references for our work.
If you have any questions during the trial, running or deployment, feel free to join our WeChat group discussion! If you have any ideas or suggestions for the project, you are also welcome to join our WeChat group discussion!
For any questions or inquiries, please reach out to us:
- Yunlong Lin: linyl@stu.xmu.edu.cn
- Zixu Lin: a860620266@gmail.com
- Kunjie Lin: linkunjie@stu.xmu.edu.cn
- Panwang Pan: paulpanwang@gmail.com
- Chenxin Li: chenxinli@link.cuhk.edu.hk
If you find JarvisArt useful in your research, please consider citing:
@article{jarvisart2025,
title={JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent},
author={Yunlong Lin and Zixu Lin and Kunjie Lin and Jinbin Bai and Panwang Pan and Chenxin Li and Haoyu Chen and Zhongdao Wang and Xinghao Ding and Wenbo Li and Shuicheng Yan},
year={2025},
journal={arXiv preprint arXiv:2506.17612}
}