Geometry-aware 3D Gaussian Model for View-Dependent Rendering of 3D Morphable Models

The creation of photorealistic avatar head models from multi-view or monocular images or videos of a face is a long-standing research topic. Besides novel view synthesis, the
construction of a 3D face model opens the way for emerging applications such as facial reenactment. Earlier approaches to this task relied on fitting to the input images the parameters of a 3D Morphable Model (3DMM) of the face, which was later improved to include non-Lambertian reflectance components. Recently, substantial gains in reconstruction quality were achieved using implicit neural radiance field (NeRF) and 3D Gaussian Splatting (3DGS). 3DGS represents a 3D scene using a distribution of 3D
Gaussian primitives whose geometries are defined by their 3D covariance matrices and whose viewpoint-dependent colors are paremeterized by a density and a set of Spherical Harmonics coefficients. The parameters of the Gaussian primitives are fitted to the input images using an optimization scheme. To allow the animation of the avatar model from facial expression cues, 3DGS models for avatar heads are further constrained to align to a mesh of the avatar head that is typically fitted to the input images in a separate process. This paper proposes a 3DGS approach that targets lightweight imaging configurations where the avatar head is captured from a sparse set of distant viewpoints and under a small set of static facial expressions, unlike video captures where a continuum of facial expressions is available for fitting an animatable 3DGS head model. The approach builds on a novel hybrid processing scheme that combines 3D Gaussian Splats, 3D Morphable Models and neural networks. It proceeds in two stages: first, an efficient geometry-aware density control of the Gaussian primitives, and second, an optimization scheme that leverages 3DMMs and neural networks to fit the 3DGS
parameters to the input images. We evaluate our method on a new dataset of high-resolution multi-view face images captured in a controlled environment, and show that it outperforms state-of-art approaches qualitatively and quantitively in rendering novel views of the face.

View Research Paper