Compact 3D Scene Representation via Self-Organizing Gaussian Grids

ECCV 2024


1 Fraunhofer Heinrich Hertz Institute, HHI
2 Humboldt University of Berlin

TL;DR

Organize the parameters of 3D Gaussian Splatting (3DGS) scenes into a 2D grid and enforce local smoothness during training. Then leverage off-the-shelf image compression to store the attribute images for a high compression rate. This beats most concurrent compression methods in size, and all other methods in quality. And the file format and decoding is dead simple.

Contributions

  1. We propose a novel compact scene representation and training concept for 3DGS, structuring the high-dimensional features in a smooth 2D grid, which can be efficiently encoded using state-of-the-art compression methods.
  2. We introduce an efficient 2D sorting algorithm called Parallel Linear Assignment Sorting (PLAS) that sorts millions of 3DGS parameters on the GPU in seconds.
  3. We provide a simple to use interface for compressing and decompressing the resulting 3D scenes. The decompressed reconstructions share the structure of 3DGS, allowing integration into established renderers.
  4. We efficiently reduce the storage size by a factor of 17x to 42x while maintaining high visual quality.

Key Insight

The goal of compressing 3DGS scenes is to provide a compact representation that can be rendered into the training & test views, without losing visual detail. We are not bound to a specific configuration of Gaussian splats: we can mold the splats during training to be well-compressible. We leverage this to enforce neighboring splats to share attributes, e.g. neighbors in xyz sharing the same rotation values.

MY ALT TEXT

Abstract

3D Gaussian Splatting has recently emerged as a highly promising technique for modeling of static 3D scenes. In contrast to Neural Radiance Fields, it utilizes efficient rasterization allowing for very fast rendering at high-quality. However, the storage size is significantly higher, which hinders practical deployment, e.g.~on resource constrained devices. In this paper, we introduce a compact scene representation organizing the parameters of 3D Gaussian Splatting (3DGS) into a 2D grid with local homogeneity, ensuring a drastic reduction in storage requirements without compromising visual quality during rendering. Central to our idea is the explicit exploitation of perceptual redundancies present in natural scenes. In essence, the inherent nature of a scene allows for numerous permutations of Gaussian parameters to equivalently represent it. To this end, we propose a novel highly parallel algorithm that regularly arranges the high-dimensional Gaussian parameters into a 2D grid while preserving their neighborhood structure. During training, we further enforce local smoothness between the sorted parameters in the grid. The uncompressed Gaussians use the same structure as 3DGS, ensuring a seamless integration with established renderers. Our method achieves a reduction factor of 17x to 42x in size for complex scenes with no increase in training time, marking a substantial leap forward in the domain of 3D scene distribution and consumption.

Method

An overview of our novel 3DGS training method. During training, we arrange all high dimensional attributes into multiple 2D grids. Those grids are sorted and a smoothness regularization is applied. This creates redundancy which help to compress the 2D grids into small files using off-the-shelf compression methods.

MY ALT TEXT

Results

Here we render results of scenes in vanilla 3DGS vs. our compression method. You can move the split divider to compare 3DGS on the left and our method on the right of each video.

Parts of the scene covered by many cameras (like the front of the truck) look near identical. Other parts (concrete a few steps away from the truck, the background) are seen by fewer cameras. In 3DGS, this produces highly view-dependent small details and floaters. Through the local neighbor smoothness, our method generalizes better for these sparser parts of the scene. The result is overall smoother, and more pleaseant to look at, also removing many of the distracting floaters.

Truck (Tanks & Temples dataset)

3DGSOurs 3DGS w/o SHOurs w/o SH

639.7 MB 28.8 MB

174.0 MB11.8 MB

Flowers (Mip-NeRF 360 dataset)

3DGSOurs 3DGS w/o SHOurs w/o SH

848.1 MB55.0 MB

232.8 MB22.2 MB

Comparison with other 3D scene representations

Quantitative results: Our method significantly outperforms the default 3D Gaussian Splatting (3DGS) and prior NeRF-based 3D reconstruction methods in terms of storage size and rendering efficiency. We achieve a reduction in storage size by a factor of 17x to 42x, depending on the dataset, with the most notable reduction being 41.6x on the Deep Blending dataset. Deactivating spherical harmonics during training further enhances this reduction to 127x over vanilla 3DGS, while also improving PSNR metrics. Our approach maintains competitive visual quality metrics (PSNR and L-PIPS) compared to Mip-NeRF360 and VQ-TensoRF but allows for real-time rendering and significantly faster training, approximately 10 to 30 minutes per scene. Additionally, our optimized method renders scenes more quickly, achieving 515 fps compared to 385 fps for the vanilla 'Truck' dataset model on the same GPU, while also displaying better visual quality, compared to vanilla 3DGS.
MY ALT TEXT

Comparison with other 3DGS compression methods

Multiple concurrent methods have been developed for compressing 3D Gaussian splats. Commonly, these methods aim to reduce the number of Gaussians needed for the scene and apply quantized vector compression. Unlike other approaches that utilize codebooks or hash grids for vector coding, our method is unique in organizing the Gaussians into locally smooth 2D grids during training.

This grid organization during training enables the use of standard image compression techniques to code the attributes. This simplifies decoding: one simply decompresses the grid attribute images and applies a rescaling. Each pixel from the coded images then corresponds to a set of Gaussian splatting attributes, which can be rendered with any standard 3DGS render engine.

As of July 2024, our method (Morgenstern et al.) achieves the highest quality for the compared datasets, second only in size to the HAC method, also presented at ECCV 2024.

We invite you to explore other methods for compressing Gaussian splats and compare their approaches in this survey:

Survey Page Survey Code & Data
3dgs.zip: A survey on 3D Gaussian Splatting Compression Methods: We're collecting results and info on methods that reduce the storage size of 3D Gaussian splatting scenes. Contributions welcome!

BibTeX

If you use our method in your research, please cite our paper. You can use the following BibTeX entry:


@article{morgenstern2023compact,
  title={Compact 3D Scene Representation via Self-Organizing Gaussian Grids},
  author={Morgenstern, Wieland and Barthel, Florian and Hilsmann, Anna and Eisert, Peter},
  journal={arXiv preprint arXiv:2312.13299},
  year={2023}
}