CGS-GAN

Abstract

Recently, 3D GANs based on 3D Gaussian splatting have been proposed for high quality synthesis of human heads. However, existing methods stabilize training and enhance rendering quality from steep viewpoints by conditioning the random latent vector on the current camera position. This compromises 3D consistency, as we observe significant identity changes when re-synthesizing the 3D head with each camera shift. Conversely, fixing the camera to a single viewpoint yields high-quality renderings for that perspective but results in poor performance for novel views. Removing view-conditioning typically destabilizes GAN training, often causing the training to collapse. In response to these challenges, we introduce CGS-GAN, a novel 3D Gaussian Splatting GAN framework that enables stable training and high-quality 3D-consistent synthesis of human heads without relying on view-conditioning. To ensure training stability, we introduce a multi-view regularization technique that enhances generator convergence with minimal computational overhead. Additionally, we adapt the conditional loss used in existing 3D Gaussian splatting GANs and propose a generator architecture designed to not only stabilize training but also facilitate efficient rendering and straightforward scaling, enabling output resolutions up to 2048. To evaluate the capabilities of CGS-GAN, we curate a new dataset derived from FFHQ. This dataset enables very high resolutions, focuses on larger portions of the human head, reduces view-dependent artifacts for improved 3D consistency, and excludes images where subjects are obscured by hands or other objects. As a result, our approach achieves very high rendering quality, supported by competitive FID scores, while ensuring consistent 3D scene generation.

Try it yourself using splatviz.

Method

Our proposed 3DGS GAN framework produces consistent 3D scenes without relying on view-conditioning.
We propose an efficient and scalable generator architecture, based on GSGAN, that allows fast synthesis speed and output resolutions of up to 2048².
We introduce a multi-view regularization that stabilizes the training and enhances consistent 3D geometry.
Leveraging a background augmentation, we are able to remove hole artifacts during rendering.

Interactive Latent Viewer

Explore the latent space using the web viewer from PlayCanvas paired with SOG Compression.

Comparison

A visual comparison among current 3DGS GANs (GGHead, GSGAN and our proposed method). We condition GGHead and GSGAN on the frontal view, as this provides the overall best results when a 3D consistent scene is required. Quantitative comparisons (FID and consistent FID_3D) are found in our paper.

GSGAN (FFHQ)	GGHead (FFHQ)	Ours (FFHQC)

View Conditioning

In the following figure, we demonstrate the effect of view-conditioning using the prior 3DGS GAN methods GSGAN and GGHead. If the view conditioning aligns with the render camera, we receive very good but inconsistent quality. But if render from a novel view, the quality decreases. As our model eliminates view-conditioning, we no longer observe such effects and instead render in high quality for any given view. To measure this effect quantitatively, we introduce a FID_3D metric that measures the FID without telling the generator in advance from which viewpoint the head will be rendered.

FFHQC Dataset

We curate a novel dataset from FFHQ that:

allows rendering at high resolutions of up to 2048²,
crops a larger region of the head to allow for synthesis of full heads,
filters 15k images that show occluded faces,
and rebalances the data distribution to reduce view-dependent effects and improve the quality of side views.

Latent Inversion

The following two videos demonstrate the 3D GAN inversion with our model. Here, we use the respective left image as a target and optimize the random latent vector, so that it resembles this image. In the left video, we show the inversion capabilities using a face shown from a frontal view. And right, we only use the side view as a target. Even in this difficult scenario, where only half of the face is visible, we are still able to obtain a realistic 3D head model.

At one point in the video the face abruptly changes its appearance. This is where we switch from optimizing the latent vector to fine-tuning the weights of the generator to achieve even better results.

Applications

Using the Unity Gaussian Splatting Plugin, we are able to import our 3D heads into explicit 3D environments.

Future Work

We are eager to improve the quality of the training data even further. Our proposed FFHQC dataset has already shown promising improvements for quality and training convergence, however, we still observe some rendering artifacts that are created by poor background masking. Guiding the masking network to only select the person in the center of the image will likely boost the overall quality even further.
Furthermore, we want to integrate data showing the back of the head.
And lastly, we want to test more generator architectures that are even more efficient, leveraging fast CUDA implementations for attention layers.

Citation

@misc{barthel2025cgsgan,
      title={CGS-GAN: 3D Consistent Gaussian Splatting GANs for High Resolution Human Head Synthesis},
      author={Florian Barthel and Wieland Morgenstern and Paul Hinzer and Anna Hilsmann and Peter Eisert},
      year={2025},
      eprint={2505.17590},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.17590},
}

Check out our other works here.

References

Gaussian web viewer: github.com/quadjr/aframe-gaussian-splatting
Website Template taken from: nerfies.github.io and nvlabs.github.io/eg3d.
Unity Gaussian Splatting Plugin
GGHead
GSGAN
PlayCanvas
SOG Compression
splatviz

CGS-GAN 3D Consistent Gaussian Splatting GANs for High Resolution Human Head Synthesis