Preserve·Reveal·Expand

PREX: Faithful 4D Video Editing with Region-Aware Conditioning

arXiv (2605.20961) Code (coming soon)

Zhangchi Hu^1,2, Wenzhang Sun^2,†, Xiangchen Yin¹, Jiahui Yuan¹

Chunfeng Wang², Hao Li², Kun Zhan², Xiaoyan Sun^1,*

¹University of Science and Technology of China · ²Li Auto Inc.

^†Project leader · ^*Corresponding author

huzhangchi@mail.ustc.edu.cn

Abstract

Existing 4D-driven video diffusion models primarily target plausible generation, but faithful 4D editing requires preserving source-observed regions while synthesizing disoccluded or out-of-view content. We identify Evidence-Role Mismatch: reliable source-backed evidence, unreliable rendered cues, and unsupported regions are entangled in a single conditioning signal, causing preservation drift, ghosting, and unstable extrapolation. We propose PREX (Preserve, Reveal, Expand), a region-aware framework that decomposes the target spatiotemporal volume into Preserve, Reveal, and Expand roles according to observation support and scene extent. PREX builds observation-backed appearance cues with calibrated confidence and injects them into a frozen video diffusion backbone through a Region-Aware Adapter, trained with proxy tasks without requiring paired edited videos. We further introduce PREBench, a diagnostic benchmark with curated edits, region-role masks, and human-aligned metrics that complement global video quality scores with targeted diagnostics for preservation drift, ghost leakage, boundary copying, and temporal instability.

Preserve

Pixels backed by valid source observations. PREX retrieves appearance from nearby source frames with visibility and depth consistency, ensuring faithful preservation of observed content.

Reveal

Unsupported but within-scene regions (disocclusions). PREX exposes these regions to the diffusion model for plausible in-scene completion with spatial-temporal context.

Expand

Out-of-view regions requiring long-range extrapolation. PREX enables coherent scene expansion beyond original field of view with temporal stability.

Method

Step 01

Region-Aware 4D Control

Divide target-frame pixels into Preserve, Reveal, and Expand regions based on observation support. Compute geometric confidence maps from projection coverage, instance consistency, and depth variation.

Step 02

Observation-Backed Appearance Conditioning

Construct appearance cues from valid source observations using visibility, depth, instance, and view-time checks. Unsupported pixels receive only weak or low-confidence conditioning.

Step 03

Region-Aware Adapter

A lightweight adapter maps appearance + confidence + region masks into residual control tokens injected into a frozen video diffusion backbone. Trained with proxy tasks — no paired editing data required.

PREBench

PREBench is the first region-aware diagnostic benchmark for 4D video editing. It provides source videos, edited 4D proxies, target cameras, and region masks (Preserve / Reveal / Expand) for each editing case — enabling targeted evaluation of preservation fidelity, ghost leakage, boundary artifacts, and temporal stability. It covers 350 real-world editing cases spanning camera-only and joint camera+object motion edits.

10,000

Training Videos

350

Real-World Test Cases

150

Camera-Only Edits

200

Camera+Object Edits

Metric Category	Metrics	What It Evaluates
Preserve	P-LPIPS, P-DISTS, P-TempDrift, P-Dyn-LPIPS	Preservation fidelity, appearance drift, temporal stability of observed content
Reveal	R-Ghost, R-Seam	Ghost leakage from source, seam visibility at reveal boundaries
Expand	E-Temp, E-Seam, E-Copy	Extrapolation coherence, boundary copying, degenerate texture repetition

Results

Camera Control

Li Garden

Original

PREX

Lucia

Original

PREX

Camel

Original

PREX

Goldfish

Original

PREX

Spring

Original

PREX

Rhino

Original

PREX

Comparison with Other Methods

DaS

GEN3C

VerseCrafter

NeoVerse

PREX (Ours)

Goldfish

Rhino

Camera & Object Joint Control

Li Ocean

Original

PREX

Li Park

Original

PREX

Breakdance

Original

PREX

Judo

Original

PREX

Mountain Bike Trick

Original

PREX

Boat

Original

PREX

Comparison with Other Methods

DaS

GEN3C

VerseCrafter

PREX (Ours)

Horsejump-low

Judo

Mbike-trick

Boat

Interactive Scene Editor

We built an interactive scene editor that enables users to intuitively explore and manipulate 4D scene representations. The editor provides real-time camera control, object selection, and region-aware editing capabilities — allowing researchers and artists to interact with the Preserve / Reveal / Expand framework in a visual, hands-on manner.

Citation

@misc{hu2026preserverevealexpandfaithful,
      title={Preserve, Reveal, Expand: Faithful 4D Video Editing with Region-Aware Conditioning}, 
      author={Zhangchi Hu and Wenzhang Sun and Xiangchen Yin and Jiahui Yuan and Chunfeng Wang and Hao Li and Kun Zhan and Xiaoyan Sun},
      year={2026},
      eprint={2605.20961},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.20961}, 
}