[CVPR 2025] GFS-VL: Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

Overview

GFS-VL is a novel framework proposed in our CVPR 2025 paper: Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model.

Our approach leverages the synergy between:

  • Dense but noisy pseudo-labels from 3D Vision-Language Models
  • Precise yet sparse few-shot samples

by maximizing the strengths of both data sources for effective generalized few-shot 3D point cloud segmentation.

Released Model Weights

This repository contains the following pre-trained weights:

  • PTv3 Backbones: Our pre-trained point transformer v3 backbones
  • GFS-VL Models: Complete GFS_VL few-shot segmentation framework

Usage

For detailed usage instructions, model implementation, and training code, please refer to our GitHub repository.

Benchmarks

We introduce two new challenging GFS-PCS benchmarks with diverse novel classes for comprehensive generalization evaluation. These benchmarks lay a solid foundation for real-world GFS-PCS advancements.

The benchmark datasets can be downloaded from our Huggingface dataset repository.

Citation

If you find our work useful, please consider citing our paper:

@inproceedings{an2025generalized,
  title={Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model},
  author={An, Zhaochong and Sun, Guolei and Liu, Yun and Li, Runjia and Han, Junlin and Konukoglu, Ender and Belongie, Serge},
  booktitle=CVPR,
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support