3D Gaussian Splatting has recently achieved notable success in novel view synthesis for dynamic scenes and geometry reconstruction in static scenes. Building on these advancements, early methods have been developed for dynamic surface reconstruction by globally optimizing entire sequences. However, reconstructing dynamic scenes with significant topology changes, emerging or disappearing objects, and rapid movements remains a substantial challenge, particularly for long sequences. To address these issues, we propose AT-GS, a novel method for reconstructing high-quality dynamic surfaces from multi-view videos through per-frame incremental optimization. To avoid local minima across frames, we introduce a unified and adaptive gradient-aware densification strategy that integrates the strengths of conventional cloning and splitting techniques. Additionally, we reduce temporal jittering in dynamic surfaces by ensuring consistency in curvature maps across consecutive frames. Our method achieves superior accuracy and temporal coherence in dynamic surface reconstruction, delivering high-fidelity space-time novel view synthesis, even in complex and challenging scenes. Extensive experiments on diverse multi-view video datasets demonstrate the effectiveness of our approach, showing clear advantages over baseline methods.
Starting with the Gaussian surfels from the previous frame t-1, we first estimate their coarse translation and rotation to align with the current frame t. Subsequently, we optimize all Gaussian attributes, incorporating our gradient-guided densification strategy. For each training view, we render opacity, depth, normal, and color maps (from top to bottom in the dashed box) using differentiable tile-based rasterization. Additionally, we predict optical flow between consecutive frames, which warps the rendered normal map from frame t-1 to frame t. We then ensure temporal consistency of the underlying surface by comparing curvature maps derived from the warped and rendered normal maps. Furthermore, we apply photometric loss, depth-normal consistency loss, and mask loss for supervision. Finally, Poisson reconstruction is employed to generate a mesh from the unprojected depth and normal maps.
We demonstrate results for both novel view synthesis and geometry reconstruction on the DNA-Rendering and NHR datasets.