RegionDrag: Fast Region-Based Image
Editing with Diffusion Models

ECCV 2024

Jingyi Lu1, Xinghui Li2, Kai Han1
1Visual AI Lab, The University of Hong Kong
2Active Vision Lab, University of Oxford

RegionDrag, a region-based image editing method, offers faster and more precise image editing than point-drag approaches by enabling users to express instructions through handle and target regions, significantly outperforming the previous SOTA in speed while achieving better performance.

RegionDrag supports a variety of inputs

Users can input regions or points to drag image contents from Red to Blue.

Input pairs of triangles or quadrilaterals.

Input regions and manipulate them by points.

Input pairs of regions.

Fast AI editing

The rich context provided by region inputs allows users to edit a 512x512 image in just 1.5 seconds, significantly faster than previous point-drag methods.

Find our code

Efficient & concise model design

We use the inputs for editing through two main steps. First, during the inversion process, the SD latent representations and self-attention features are copied from the handle region. Then, during the denoising process, we paste the copied latent representations at the target positions and replace the corresponding self-attention features.

Read the paper

New benchmarks for region-based editing evaluation

DragBench-S and DragBench-D are existing benchmarks for evaluating point-drag methods. We modify these benchmarks to use regions instead of points to reflect user intentions, creating DragBench-SR and DragBench-DR (where R stands for 'Region').

Download the dataset