Example Results
With 3D-LaSR, you are able to reenact facial expressions and edit arbitrary facial attributes. We demonstrate the results with videos from the FFHQ dataset and editing boundaries for age, sentiment and glasses.
The latent space of Generative Adversarial Networks (GANs) forms a continuous manifold where each point corresponds to a realistic image. Traversing linear paths within this space yields smooth transitions in image appearance, preserving realism without abrupt changes. For example, interpolating between latent codes of a closed-mouth and an open-mouth subject produces a natural animation of mouth movement. Building on this property, we propose 3D-LaSR, a novel method for reenacting 3D human heads using latent space paths, derived from a single monocular video.
Our approach leverages GAN inversion to extract latent vectors from video frames, and supports expression transfer between identities by applying latent animation paths from one subject to another. Consequently, our method outputs dynamic sequences of latent vectors. Unlike decoder-based techniques that produce fixed geometry, 3D-LaSR can directly manipulate the scenes using established GAN editing techniques to modify attributes such as hairstyle, age, eyewear, or expression.
The method uses a key frame inversion approach, where intermediate frames are synthesized through linear interpolation. This significantly reduces computational cost compared to methods that require per-frame inversion. We validate our approach through a comprehensive set of experiments, demonstrating high-quality video synthesis, effective expression transfer, and flexible editability. Finally, as we develop our method around state-of-the-art 3D Gaussian splatting GANs, our method is able to render in explicit 3D environments such as video engines or VR settings.
With 3D-LaSR, you are able to reenact facial expressions and edit arbitrary facial attributes. We demonstrate the results with videos from the FFHQ dataset and editing boundaries for age, sentiment and glasses.
The key technical challenge is that reenactment and editing require disentangling expression from identity in the GAN latent space. Without such disentanglement, expression transfer across identities leads to loss of realism or identity drift, and edits may unintentionally alter expressions. To address this, we introduce a joint inversion and fine-tuning pipeline that maps multiple face videos into a shared latent space and restructures it such that identity and expression become independent, transferable, and continuously modifiable components. This design enables us to transfer expressions between identities while applying semantic edits, all with only a short monocular video per person—capturable, for instance, with a mobile phone.
We want to point out the following works, without which 3D-LaSR would not have been possible:
@inproceedings{10.1145/3756863.3769708,
author = {Hinzer, Paul and Barthel, Florian and Hilsmann, Anna and Eisert, Peter},
title = {3D-Aware Latent-Space Reenactment: Combining Expression Transfer and Semantic Editing},
year = {2025},
isbn = {9798400721175},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3756863.3769708},
doi = {10.1145/3756863.3769708},
abstract = {The latent space of Generative Adversarial Networks (GANs) forms a continuous manifold where each point corresponds to a realistic image. Traversing linear paths within this space yields smooth transitions in image appearance, preserving realism without abrupt changes. For example, interpolating between latent codes of a closed-mouth and an open-mouth subject produces a natural animation of mouth movement. Building on this property, we propose 3D-LaSR, a novel method for reenacting 3D human heads using latent space paths, derived from a single monocular video. Our approach leverages GAN inversion to extract latent vectors from video frames, and supports expression transfer between identities by applying latent animation paths from one subject to another. Consequently, our method outputs dynamic sequences of latent vectors. Unlike decoder-based techniques that produce fixed geometry, 3D-LaSR can directly manipulate the scenes using established GAN editing techniques to modify attributes such as hairstyle, age, eyewear, or expression. The method uses a key frame inversion approach, where intermediate frames are synthesized through linear interpolation. This significantly reduces computational cost compared to methods that require per-frame inversion. We validate our approach through a comprehensive set of experiments, demonstrating high-quality video synthesis, effective expression transfer, and flexible editability. Finally, as we develop our method around state-of-the-art 3D Gaussian splatting GANs, our method is able to render in explicit 3D environments such as video engines or VR settings.},
booktitle = {Proceedings of the 22nd ACM SIGGRAPH European Conference on Visual Media Production},
articleno = {13},
numpages = {12},
location = {
},
series = {CVMP '25}
}