
Share
Apple’s new open-source AI model, MGIE, uses multimodal large language models to interpret natural language commands for precise image editing, offering versatility from Photoshop-style tweaks to global photo enhancements.
Apple has released a new open-source AI model called "MGIE" (MLLM-Guided Image Editing), which can edit images based on natural language instructions. This model, developed in collaboration with researchers from the University of California, Santa Barbara, leverages multimodal large language models (MLLMs) to interpret user commands and perform pixel-level manipulations. MGIE is designed to handle a variety of editing tasks, including Photoshop-style modifications, global photo optimization, and local edits.
MGIE's core innovation lies in its use of MLLMs, which are powerful AI models capable of processing both text and images. This capability allows MGIE to bridge the gap between natural language instructions and precise image manipulations. The model was introduced in a paper accepted at the International Conference on Learning Representations (ICLR) 2024, one of the top venues for AI research.
Instruction Derivation:
Visual Imagination:
End-to-End Training Scheme:
MGIE is versatile and can handle a wide range of editing scenarios, from simple color adjustments to complex object manipulations. Some of its key features include:

Expressive Instruction-Based Editing:
Photoshop-Style Modification:
Global and Local Edits:
The paper demonstrates that MGIE not only improves automatic metrics but also receives positive feedback in human evaluations. Additionally, the model maintains competitive inference efficiency, ensuring that it can be used in real-world applications without significant performance overhead.
For software engineers and AI researchers, MGIE represents a significant step forward in instruction-based image editing. The use of MLLMs for deriving instructions and generating visual imagination opens up new possibilities for creating more intuitive and user-friendly image editing tools. This model can be particularly useful in applications where precise control over edits is crucial, such as in graphic design, photography, and content creation.
Apple's release of MGIE marks a significant advancement in the field of AI-powered image editing. By leveraging MLLMs and a novel training scheme, MGIE provides a robust solution for instruction-based editing that can handle a wide range of tasks with high precision and efficiency. Whether you're a professional designer or an amateur photographer, MGIE offers a powerful tool to enhance your workflow.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
13 February 2024
133 articles
Related Articles
Related Articles
More Stories