SMERF models large multi-room scenes with independent submodels, each assigned to a different region of the scene. It also models complex view-dependent effects within each submodel. Image fidelity is boosted via distillation, using the RGB color predictions and volumetric density values of a pre-trained teacher model. The method renders frames three orders of magnitude faster than state-of-the-art radiance field models and achieves real-time performance across a variety of devices, including smartphones.
Key takeaways:
- The article introduces SMERF, a view synthesis approach that achieves state-of-the-art accuracy among real-time methods on large scenes with footprints up to 300 m^2 at a volumetric resolution of 3.5 mm^3.
- SMERF is built upon two primary contributions: a hierarchical model partitioning scheme, which increases model capacity while constraining compute and memory consumption, and a distillation training strategy that simultaneously yields high fidelity and internal consistency.
- The approach enables full six degrees of freedom (6DOF) navigation within a web browser and renders in real-time on commodity smartphones and laptops.
- SMERF exceeds the current state-of-the-art in real-time novel view synthesis by 0.78 dB on standard benchmarks and 1.78 dB on large scenes, renders frames three orders of magnitude faster than state-of-the-art radiance field models, and achieves real-time performance across a wide variety of commodity devices, including smartphones.