pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

David Charatan, Sizhe Li, Andrea Tagliasacchi, and Vincent Sitzmann
Paper Code Pre-trained Models Sample Data

TL;DR

pixelSplat infers a 3D Gaussian scene from two input views in a single forward pass.

Abstract

We introduce pixelSplat, a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images. Our model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time. To overcome local minima inherent to sparse and locally supported representations, we predict a dense probability distribution over 3D and sample Gaussian means from that probability distribution. We make this sampling operation differentiable via a reparameterization trick, allowing us to back-propagate gradients through the Gaussian splatting representation. We benchmark our method on wide-baseline novel view synthesis on the real-world RealEstate10k and ACID datasets, where we outperform state-of-the-art light field transformers and accelerate rendering by 2.5 orders of magnitude while reconstructing an interpretable and editable 3D radiance field.

Comparison vs. Baselines

We compare our method against the following baselines:

ACID Dataset

comparison on ACID dataset

Real Estate 10k Dataset

comparison on ACID dataset

3D Gaussian Point Clouds

Because pixelSplat infers a set of 3D Gaussians, we can visualize these Gaussians and render them to produce depth maps. Since the Real Estate 10k and ACID datasets contain many areas with ambiguous depth (e.g., large, textureless surfaces like interior walls), we fine-tune pixelSplat for 50,000 iterations using a depth regularizer before exporting 3D Gaussian point clouds.

point clouds and depth maps