Trends

February 15, 2024

Apple Releases Instruction-Based Image Editing Models Playground

Harnessing Multimodal Large Language Models for Instruction-Based Image Editing

Recently, advancements in artificial intelligence have paved the way for innovative applications in image editing. One such breakthrough is the utilization of Multimodal Large Language Models (MLLMs). These models guide instruction-based image editing, offering a new level of interactivity and precision.

Furthermore, Multimodal Large Language Models, like the one developed by Apple, are at the forefront of this technological evolution. These models combine textual and visual information to understand and execute complex editing instructions.

Consequently, the MGIE (MLLM-Guided Image Editing) approach leverages these models to interpret user commands and apply the desired changes to images. This method has shown remarkable improvements in various editing tasks. Such as Photoshop-style alterations to global photo optimization and local object adjustments.

Moreover, the MGIE framework not only enhances the quality of edits but also maintains competitive efficiency. This makes MGIE a practical solution for real-world applications. Its success is evident in its performance across different benchmarks, outperforming existing methods in both automatic metrics and human evaluations.

Figure. 2: A side-by-side evaluation of InsPix2Pix, LGIE, and MGIE. MGIE effectively renders “lightning” and its reflection, while only MGIE removes the targeted Christmas tree without affecting the surroundings. In optimizing photos, unlike its counterparts, MGIE accurately brightens and sharpens the image. Lastly, MGIE skillfully applies glaze to only the donuts, avoiding the overapplication seen in other models.

The Huggingface Space

Similarly, the proposed method, MLLM-Guided Image Editing (MGIE), leverages the expressive power of MLLMs to generate detailed instructions for image editing tasks.

MGIE is designed to handle a wide range of editing tasks, including Photoshop-style modifications, global photo optimization, and local object alteration.

If you want to play around with the model, please visit the Huggingface Space:

Figure 3. A screenshot of the Gradio web interface, a user-friendly platform for submitting image editing instructions and viewing the resulting transformations. Access the Gradio Playground here.

The Paper

Building on the foundation of MLLMs, MGIE introduces a highly adaptable framework for executing intricate image transformations.

By harnessing the power of MLLMs, MGIE is able to comprehend and implement nuanced editing instructions. This allows for a seamless translation of user commands into visually stunning outcomes.

Innovative Editing Approach

This innovative approach not only streamlines the editing process but also consequently opens up new possibilities for creative expression in digital imagery.

With MGIE, users can expect a more intuitive and efficient editing experience, revolutionizing the way we interact with and manipulate images in the digital age.

Detailed Mechanism Overview

Additionally, the model depicted in the image represents a sophisticated approach to instruction-based image editing using Multimodal Large Language Models (MLLMs).

At its core, the process begins with an input instruction and an image. The MLLM, equipped with summarization capabilities (Summ), distills the verbose instruction into a more focused expressive instruction.

This is then fed into the MLLM along with an adapter and an embedding module, which prepares it for image transformation.

Subsequently, the visual tokens generated from this process are routed through an Edit Head that interfaces with a Diffusion model.

The Diffusion model, responsible for the heavy lifting of pixel manipulation, takes cues from the processed instructions to produce the final edited image that aligns with the user’s initial request.

This innovative pipeline showcases the potential of leveraging language models to intuitively guide and execute complex image editing tasks, a testament to the evolving intersection of AI and creative digital media.

You can read the entire paper below or by visiting the arXiv publication.

ICLR Paper: The ICLR 2024 conference paper, featuring an abstract and introductory figures that present the guiding principles of instruction-based image editing via MLLMs. You can find the paper here.

Worth Exploring:

Open-Source Latte Released: Train Your Own SORA-like Text-to-Video

Latte, an open-source AI model, an alternative to SORA, for video generation can be finetuned for custom video generation applications.

Read Article

Apple Releases Instruction-Based Image Editing Models Playground

Harnessing Multimodal Large Language Models for Instruction-Based Image Editing

The Huggingface Space

The Paper

Innovative Editing Approach

Detailed Mechanism Overview

Worth Exploring:

Open-Source Latte Released: Train Your Own SORA-like Text-to-Video

Further Implementation of Dynamic Risk Management Methods

Does the Stock Market Overreact, Still?

Leave a Comment Cancel Reply

Cristian Velasquez

How Bitcoin Can Survive Quantum Computing Threats in

Gold Faces Pressure as It Approaches Key Support

XRP Holds Strong with 300% Gains; What Lies

AI Singing Voice Cloning with AI in Python

Acquiring and Analyzing Earnings Announcements Data in Python

Top 36 Moving Averages Methods For Stock Prices

Technical Guides

Stock Market News

Forex Market News

Crypto Market News

Classify Stock Moves with KNN and Lorentzian

Market Memory Structure with Autocorrelation Periodgram

<img width="230" height="40" src="//entreprenerdly.com/wp-content/uploads/2025/04/Entreprenerdly-Logo-BLACK-min2.svg" alt="Search">

Apple Releases Instruction-Based Image Editing Models Playground

Harnessing Multimodal Large Language Models for Instruction-Based Image Editing

The Huggingface Space

The Paper

Innovative Editing Approach

Detailed Mechanism Overview

Worth Exploring:

Open-Source Latte Released: Train Your Own SORA-like Text-to-Video

Further Implementation of Dynamic Risk Management Methods

Does the Stock Market Overreact, Still?

Get Every Weekly Update & Insights

Leave a Comment Cancel Reply

Cristian Velasquez

Categories

Newsletter

Recent Feeds