Papers
arxiv:2507.13347

π^3: Scalable Permutation-Equivariant Visual Geometry Learning

Published on Jul 17
· Submitted by tonghe90 on Jul 18
#3 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,

Abstract

A permutation-equivariant neural network, $\pi^3$, reconstructs visual geometry without a fixed reference view, achieving state-of-the-art performance in camera pose estimation, depth estimation, and point map reconstruction.

AI-generated summary

We introduce pi^3, a feed-forward neural network that offers a novel approach to visual geometry reconstruction, breaking the reliance on a conventional fixed reference view. Previous methods often anchor their reconstructions to a designated viewpoint, an inductive bias that can lead to instability and failures if the reference is suboptimal. In contrast, pi^3 employs a fully permutation-equivariant architecture to predict affine-invariant camera poses and scale-invariant local point maps without any reference frames. This design makes our model inherently robust to input ordering and highly scalable. These advantages enable our simple and bias-free approach to achieve state-of-the-art performance on a wide range of tasks, including camera pose estimation, monocular/video depth estimation, and dense point map reconstruction. Code and models are publicly available.

Community

Paper author Paper submitter

Code is available: https://github.com/yyfz/Pi3
Huggingface demo is available: https://huggingface.co/spaces/yyfz233/Pi3
Project page is available: https://yyfz.github.io/pi3/

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.13347 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 3