RayYoh nielsr HF Staff commited on
Commit
0d785b3
·
verified ·
1 Parent(s): 2a56d43

Add comprehensive model card for GaussianCross (#1)

Browse files

- Add comprehensive model card for GaussianCross (5b388dcecad275055833d8291375008a8e85f89d)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +117 -3
README.md CHANGED
@@ -1,3 +1,117 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: other
4
+ library_name: pointcept
5
+ tags:
6
+ - 3d
7
+ - gaussian-splatting
8
+ - point-cloud
9
+ - self-supervised-learning
10
+ - representation-learning
11
+ ---
12
+
13
+ # GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting
14
+
15
+ GaussianCross is a novel cross-modal self-supervised 3D representation learning architecture that integrates feed-forward 3D Gaussian Splatting (3DGS) techniques. It aims to generate informative and robust point representations for 3D scene understanding, demonstrating strong performance on tasks like semantic and instance segmentation.
16
+
17
+ <p align="center">
18
+ <a href="https://huggingface.co/papers/2508.02172"><img src='https://img.shields.io/badge/arXiv-2508.02172-b31b1b.svg' alt='arXiv link'></a>
19
+ <a href="https://rayyoh.github.io/GaussianCross/"><img src='https://img.shields.io/badge/Project-Page-Green' alt='Project Page link'></a>
20
+ <a href="https://github.com/RayYoh/GaussianCross"><img src="https://img.shields.io/badge/GitHub-Code-blue?logo=github&" alt='GitHub Code link'></a>
21
+ </p>
22
+
23
+ <div align="center">
24
+ <img src="https://huggingface.co/RayYoh/GaussianCross/resolve/main/assets/teaser.png" width="80%" alt="GaussianCross Teaser"/>
25
+ </div>
26
+
27
+ ## Abstract
28
+ The significance of informative and robust point representations has been widely acknowledged for 3D scene understanding. Despite existing self-supervised pre-training counterparts demonstrating promising performance, the model collapse and structural information deficiency remain prevalent due to insufficient point discrimination difficulty, yielding unreliable expressions and suboptimal performance. In this paper, we present GaussianCross, a novel cross-modal self-supervised 3D representation learning architecture integrating feed-forward 3D Gaussian Splatting (3DGS) techniques to address current challenges. GaussianCross seamlessly converts scale-inconsistent 3D point clouds into a unified cuboid-normalized Gaussian representation without missing details, enabling stable and generalizable pre-training. Subsequently, a tri-attribute adaptive distillation splatting module is incorporated to construct a 3D feature field, facilitating synergetic feature capturing of appearance, geometry, and semantic cues to maintain cross-modal consistency. To validate GaussianCross, we perform extensive evaluations on various benchmarks, including ScanNet, ScanNet200, and S3DIS. In particular, GaussianCross shows a prominent parameter and data efficiency, achieving superior performance through linear probing (<0.1% parameters) and limited data training (1% of scenes) compared to state-of-the-art methods. Furthermore, GaussianCross demonstrates strong generalization capabilities, improving the full fine-tuning accuracy by 9.3% mIoU and 6.1% AP$_{50}$ on ScanNet200 semantic and instance segmentation tasks, respectively, supporting the effectiveness of our approach.
29
+
30
+ ## Pipeline
31
+ <div align="center">
32
+ <img src="https://huggingface.co/RayYoh/GaussianCross/resolve/main/assets/pepeline.png" width="100%" alt="GaussianCross Pipeline"/>
33
+ </div>
34
+
35
+ ## Installation
36
+ Our model is built on the [Pointcept toolkit](https://github.com/Pointcept/Pointcept). You can follow its official instructions to install the packages:
37
+
38
+ ```bash
39
+ conda create -n GaussianCross python=3.8 -y
40
+ conda activate GaussianCross
41
+
42
+ # Further installation steps can be found in the Pointcept documentation or the GaussianCross GitHub repository.
43
+ # Example from Pointcept's README:
44
+ # pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --index-url https://download.pytorch.org/whl/cu118
45
+ # pip install -r requirements.txt
46
+ # python setup.py develop
47
+ ```
48
+ Note that they also provide scripts to build corresponding docker image: [build_image.sh](https://github.com/Pointcept/Pointcept/blob/main/scripts/build_image.sh)
49
+
50
+ ## Data Preprocessing
51
+ **ScanNet V2 & ScanNet200**
52
+ - Download the [ScanNet V2](http://www.scan-net.org/) dataset.
53
+ - Run preprocessing code for raw ScanNet as follows (detailed scripts are in the [GitHub repository](https://github.com/RayYoh/GaussianCross)):
54
+
55
+ ```bash
56
+ # xxx (Refer to GitHub for specific commands, e.g., python tools/prepare_scannet.py)
57
+ ```
58
+ - Link processed dataset to codebase:
59
+ ```bash
60
+ # PROCESSED_SCANNET_DIR: the directory of the processed ScanNet dataset.
61
+ mkdir data
62
+ ln -s ${PROCESSED_SCANNET_DIR} ${CODEBASE_DIR}/data/scannet
63
+ ```
64
+ **S3DIS**
65
+ We use the preprocessed S3DIS data from [Pointcept](https://github.com/Pointcept/Pointcept?tab=readme-ov-file#s3dis).
66
+ - Link processed dataset to codebase:
67
+ ```bash
68
+ # PROCESSED_S3DIS_DIR: the directory of the processed S3DIS dataset.
69
+ ln -s ${PROCESSED_S3DIS_DIR} ${CODEBASE_DIR}/data/s3dis
70
+ ```
71
+
72
+ ## Usage (Training with Pretrained Weights)
73
+ The training process is based on configs in the `configs` folder of the GitHub repository. The training scripts will create an experiment folder in `exp` and backup essential code in the experiment folder. Training config, log file, tensorboard, and checkpoints will also be saved during the training process.
74
+
75
+ **Attention:** Note that a critical difference from Pointcept is that most of data augmentation operations are conducted on GPU in this [file](https://github.com/RayYoh/GaussianCross/blob/main/pointcept/custom/transform_tensor.py). Make sure `ToTensor` is before the augmentation operations.
76
+
77
+ Download the pretrained 3D backbone from [this Hugging Face repository](https://huggingface.co/RayYoh/GaussianCross/blob/main/pretrain-gs-v4-spunet-base/model/model_last.pth).
78
+
79
+ **ScanNet V2 Examples**
80
+ ```bash
81
+ # Load the pretrained model
82
+ WEIGHT="path/to/downloaded/model/model_last.pth"
83
+
84
+ # Linear Probing
85
+ CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet -c semseg-spunet-base-lin -n semseg-spunet-base-lin -w $WEIGHT
86
+ # Semantic Segmentation
87
+ CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet -c semseg-spunet-base -n semseg-spunet-base -w $WEIGHT
88
+ # Instance Segmentation
89
+ CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet -c insseg-pg-spunet-base -n insseg-pg-spunet-base -w $WEIGHT
90
+ # Parameter Efficiency and Data Efficiency
91
+ CUDA_VISIBLE_DEVICES=0,1,2,3 sh scripts/train.sh -g 4 -d scannet -c semseg-spunet-efficient-[la20-lr20] -n semseg-spunet-efficient-[la20-lr20] -w $WEIGHT
92
+ ```
93
+
94
+ For more detailed training scripts and configurations for ScanNet200 and S3DIS, please refer to the [official GitHub repository](https://github.com/RayYoh/GaussianCross).
95
+
96
+ ## Acknowledgement
97
+ The research work was conducted in the JC STEM Lab of Machine Learning and Computer Vision funded by The Hong Kong Jockey Club Charities Trust.
98
+
99
+ Our code is primarily built upon [Pointcept](https://github.com/Pointcept/Pointcept), [Ponder V2](https://github.com/OpenGVLab/PonderV2) and [gsplat](https://github.com/nerfstudio-project/gsplat).
100
+
101
+ ## Citation
102
+ If you find our work helpful or inspiring, please feel free to cite it.
103
+ ```bib
104
+ @article{yao2025gaussiancross,
105
+ title={GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting},
106
+ author={Yao, Lei and Wang, Yi and Zhang, Yi and Liu, Moyun and Chau, Lap-Pui},
107
+ journal={arXiv preprint arXiv:2508.02172},
108
+ year={2025}
109
+ }
110
+ or
111
+ @inproceedings{yao2025gaussiancross,
112
+ title={GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting},
113
+ author={Yao, Lei and Wang, Yi and Zhang, Yi and Liu, Moyun and Chau, Lap-Pui},
114
+ booktitle={Proceedings of the 33nd ACM International Conference on Multimedia},
115
+ year={2025}
116
+ }
117
+ ```