GGPT: Geometry-Grounded Point Transformer

GGPT can use reliable geometric guidance to augment various feed-forward reconstruction.

GGPT generalizes across scenes with distinct geometry.

Summary

Recent feed-forward networks have achieved remarkable progress in sparse-view 3D reconstruction, but often suffer from geometry inconsistencies due to the lack of explicit multi-view constraints. To address this, we introduce the Geometry-Grounded Point Transformer (GGPT). Our framework first leverages an improved Structure-from-Motion pipeline to efficiently estimate accurate camera poses and partial 3D point clouds. Building on this, a geometry-guided 3D point transformer refines dense point maps under explicit sparse-geometry supervision. Extensive experiments show that GGPT seamlessly integrates geometric priors with dense feed-forward predictions, producing geometrically consistent and spatially complete reconstructions that generalize across architectures and datasets.

GGPT, trained on ScanNet++ with VGGT's outputs, can improve various feed-forward reconstructions, particularly on out-of-domain datasets.

Method

In this work, we introduce a geometry-guided framework that refines dense feed-forward reconstructions using accurate geometric guidance obtained from an improved SfM pipeline, directly in 3D space. First, we revisit sparse-view SfM and introduce an improved pipeline that integrates dense matchers with a lightweight optimisation procedure. Second, we introduce a variant of a lightweight 3D Point Transformer that jointly processes dense point maps from feed-forward models and the geometrically grounded partial point cloud from our SfM pipeline, predicting residual corrections for every dense point.

BibTeX

@inproceedings{chen2026ggpt,
  title={GGPT: Geometry-Grounded Point Transformer},
  author={Chen, Yutong and Wang, Yiming and Zhang, Xucong and Prokudin, Sergey and Tang, Siyu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}