3D-LaSR

Abstract

The latent space of Generative Adversarial Networks (GANs) forms a continuous manifold where each point corresponds to a realistic image. Traversing linear paths within this space yields smooth transitions in image appearance, preserving realism without abrupt changes. For example, interpolating between latent codes of a closed-mouth and an open-mouth subject produces a natural animation of mouth movement. Building on this property, we propose 3D-LaSR, a novel method for reenacting 3D human heads using latent space paths, derived from a single monocular video.

Our approach leverages GAN inversion to extract latent vectors from video frames, and supports expression transfer between identities by applying latent animation paths from one subject to another. Consequently, our method outputs dynamic sequences of latent vectors. Unlike decoder-based techniques that produce fixed geometry, 3D-LaSR can directly manipulate the scenes using established GAN editing techniques to modify attributes such as hairstyle, age, eyewear, or expression.

The method uses a key frame inversion approach, where intermediate frames are synthesized through linear interpolation. This significantly reduces computational cost compared to methods that require per-frame inversion. We validate our approach through a comprehensive set of experiments, demonstrating high-quality video synthesis, effective expression transfer, and flexible editability. Finally, as we develop our method around state-of-the-art 3D Gaussian splatting GANs, our method is able to render in explicit 3D environments such as video engines or VR settings.

Example Results

With 3D-LaSR, you are able to reenact facial expressions and edit arbitrary facial attributes. We demonstrate the results with videos from the FFHQ dataset and editing boundaries for age, sentiment and glasses.

From left to right: Original video, reenacted video, reenacted and younger, reenacted and with glasses

From left to right: Original video, reenacted face, reeneacted face edited younger, edited younger and happier and finally younger, happier and from novel viewpoints

From left to right: Original video, reenacted video, reenacted and happier

From left to right: Original video, reenacted video, reenacted and younger

Method

The key technical challenge is that reenactment and editing require disentangling expression from identity in the GAN latent space. Without such disentanglement, expression transfer across identities leads to loss of realism or identity drift, and edits may unintentionally alter expressions. To address this, we introduce a joint inversion and fine-tuning pipeline that maps multiple face videos into a shared latent space and restructures it such that identity and expression become independent, transferable, and continuously modifiable components. This design enables us to transfer expressions between identities while applying semantic edits, all with only a short monocular video per person—capturable, for instance, with a mobile phone.

Interpolate start reference image. — An illustration of the latent space after the joint tuning. The offsets are optimized such that the same offset results in a similar change of expression in both people.

BibTeX

@inproceedings{10.1145/3756863.3769708,
author = {Hinzer, Paul and Barthel, Florian and Hilsmann, Anna and Eisert, Peter},
title = {3D-Aware Latent-Space Reenactment: Combining Expression Transfer and Semantic Editing},
year = {2025},
isbn = {9798400721175},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3756863.3769708},
doi = {10.1145/3756863.3769708},
abstract = {The latent space of Generative Adversarial Networks (GANs) forms a continuous manifold where each point corresponds to a realistic image. Traversing linear paths within this space yields smooth transitions in image appearance, preserving realism without abrupt changes. For example, interpolating between latent codes of a closed-mouth and an open-mouth subject produces a natural animation of mouth movement. Building on this property, we propose 3D-LaSR, a novel method for reenacting 3D human heads using latent space paths, derived from a single monocular video. Our approach leverages GAN inversion to extract latent vectors from video frames, and supports expression transfer between identities by applying latent animation paths from one subject to another. Consequently, our method outputs dynamic sequences of latent vectors. Unlike decoder-based techniques that produce fixed geometry, 3D-LaSR can directly manipulate the scenes using established GAN editing techniques to modify attributes such as hairstyle, age, eyewear, or expression. The method uses a key frame inversion approach, where intermediate frames are synthesized through linear interpolation. This significantly reduces computational cost compared to methods that require per-frame inversion. We validate our approach through a comprehensive set of experiments, demonstrating high-quality video synthesis, effective expression transfer, and flexible editability. Finally, as we develop our method around state-of-the-art 3D Gaussian splatting GANs, our method is able to render in explicit 3D environments such as video engines or VR settings.},
booktitle = {Proceedings of the 22nd ACM SIGGRAPH European Conference on Visual Media Production},
articleno = {13},
numpages = {12},
location = {
},
series = {CVMP '25}
}

3D-LaSR

3D-Aware Latent-Space Reenactment: Combining Expression Transfer and Semantic Editing

CVMP Best Paper Award (2025)!

Abstract

Example Results

Method

Related Links

BibTeX