SparseSplat: Towards Applicable Feed-Forward 3D Gaussian Splatting with Pixel-Unaligned Prediction

Zicheng Zhang1, Xiangting Meng2, Ke Wu1, Wenchao Ding2
1Fudan University    2ShanghaiTech University
CVPR 2026
SparseSplat teaser: Efficiency vs. Quality comparison showing fewer Gaussians with better performance
SparseSplat achieves state-of-the-art rendering quality on DL3DV using significantly fewer Gaussians than the previous SOTA, DepthSplat (150k vs. 688k). Our model also generates competitive results in sparse settings (e.g., 10k). As illustrated by ellipsoid renderings, SparseSplat adaptively allocates Gaussian density based on scene content β€” contrasting with pixel-aligned methods that produce spatially uniform and highly redundant primitives even in textureless regions.

Abstract

Recent progress in feed-forward 3D Gaussian Splatting (3DGS) has notably improved rendering quality. However, the spatially uniform and highly redundant 3DGS map generated by previous feed-forward 3DGS methods limits their integration into downstream reconstruction tasks. We propose SparseSplat, the first feed-forward 3DGS model that adaptively adjusts Gaussian density according to scene structure and information richness of local regions, yielding highly compact 3DGS maps.

To achieve this, we propose entropy-based probabilistic sampling, generating large, sparse Gaussians in textureless areas and assigning small, dense Gaussians to regions with rich information. Additionally, we designed a specialized 3D-Local Attribute Predictor that efficiently encodes local context and decodes it into 3DGS attributes, addressing the receptive field mismatch between the general 3DGS optimization pipeline and feed-forward models. Extensive experimental results demonstrate that SparseSplat can achieve state-of-the-art rendering quality with only 22% of the Gaussians and maintain reasonable rendering quality with only 1.5% of the Gaussians.

Key Contributions

πŸ” Two Fundamental Design Flaws Identified

We are the first to identify and analyze two fundamental design flaws in existing feed-forward 3DGS: the distribution mismatch (rigid structure vs. content-aware distribution) and the receptive field mismatch (global context vs. local optimization).

πŸ“Š Adaptive Primitive Sampling

A novel entropy-based strategy that discards the "pixel-aligned" paradigm to generate a sparse, content-aware set of 3DGS anchors. The temperature parameter Ο„ provides intuitive control over the quality-memory trade-off, enabling diverse downstream applications from AR/VR to 3DGS-SLAM.

🧠 3D-Local Attribute Predictor

A lightweight predictor leveraging 3D K-Nearest-Neighbors, aligning the network's design with the local nature of 3DGS optimization. Instead of regressing from a single pixel's feature, it aggregates neighborhood information in 3D space for each sparse anchor point.

Method Overview

SparseSplat pipeline: Feature & Depth Extraction, Adaptive Primitive Sampling, and 3D-Local Attribute Prediction
Overall Pipeline of SparseSplat. Our method begins with a frozen backbone to generate feature and depth maps from multi-view posed images. The Adaptive Primitive Sampling stage computes entropy maps, transforms them into probability maps, and samples sparse 2D pixels β€” which are then back-projected into 3D Sparse Anchor Points. Finally, for each anchor point, we gather its local KNN neighborhood and feed it into a lightweight prediction head to predict complete Gaussian attributes (Ξ±, s, q, c).
Locality of 3DGS optimization illustration
Locality of Classic 3DGS Optimization. During backpropagation, gradients propagate through only nearby pixels, confirming that Gaussian attributes are determined by local neighborhoods β€” motivating our 3D-Local Attribute Predictor design.

Experimental Results

Quantitative Results on DL3DV

SparseSplat is evaluated at various operating points by adjusting the temperature parameter Ο„, resulting in different Gaussian counts (150k, 100k, 40k, 10k). Our method achieves state-of-the-art rendering quality at 150k Gaussians (PSNR 24.20 vs DepthSplat's 24.17) while using only 22% of the primitives.

Method Category PSNR ↑ SSIM ↑ LPIPS ↓ GS Count ↓ Time (s) ↓
Pixel-Aligned Methods
MVSplat pixel-aligned 22.950.7740.192688k0.260
DepthSplat pixel-aligned 24.170.8160.152688k0.128
Post-Processing Methods
GGN postprocess 20.230.5700.268162k0.320
Long-LRM postprocess 20.920.6270.265200k0.115
Voxelization Methods
AnySplat voxelization 17.450.4710.320608k0.378
17.340.4630.330528k0.384
16.140.4170.390393k0.415
12.220.3090.519222k0.441
8.910.2390.618113k0.473
Ours – SparseSplat (Adaptive)
SparseSplat (150k) adaptive 24.200.8170.168150k0.398
SparseSplat (100k) adaptive 23.950.7860.189100k0.192
SparseSplat (40k) adaptive 22.650.7370.25140k0.111
SparseSplat (10k) adaptive 21.290.6650.32110k0.105

Quantitative comparison on the DL3DV dataset. GS Cnt = average number of Gaussian primitives across scenes.

Visual Comparison on DL3DV

Rendering quality comparisons on DL3DV between SparseSplat and baselines
Rendering quality comparisons on DL3DV. Our model matches the SOTA rendering quality of DepthSplat with only 150k Gaussians (vs. 688k). Under sparse settings (40k and 10k), our method maintains structural integrity with minor progressive blurring.

Efficiency vs. Quality Trade-off

PSNR vs Gaussian Count plot comparing SparseSplat with baselines
SparseSplat achieves superior PSNR at every Gaussian budget compared to baselines. Unlike AnySplat which degrades severely at lower counts, SparseSplat maintains robust quality even at extreme sparsity (10k Gaussians, β‰ˆ1.5% of DepthSplat).

Downstream Applications

πŸ₯½ AR/VR

The 150k model (398ms) is suited for AR/VR, where a higher one-time reconstruction cost is acceptable to gain a highly compact model ideal for efficient on-device storage and real-time rendering.

πŸ—ΊοΈ 3DGS-SLAM

The 10k/40k models (105ms/111ms) are ideal for real-time SLAM β€” faster than DepthSplat (128ms) while maintaining robust quality. SparseSplat makes feed-forward 3DGS practical for sequential scene reconstruction.

πŸ€– Robotics & Edge

With a flexible Ο„ parameter, users can select any operating point on the quality-speed-memory spectrum β€” critical for resource-constrained edge platforms such as cars, drones, or AR glasses.

Paper

BibTeX

@inproceedings{zhang2026sparssplat,
  title={SparseSplat: Towards Applicable Feed-Forward 3D Gaussian Splatting with Pixel-Unaligned Prediction},
  author={Zhang, Zicheng and Meng, Xiangting and Wu, Ke and Ding, Wenchao},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}