Improve model card: Add paper and benchmark GitHub links
Browse filesHi there!
This PR improves the model card for `SpaceOm` by enhancing its connection to the research paper and related code:
- Adds the `paper: 2506.07966` metadata tag to link the model directly to the [SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence](https://huggingface.co/papers/2506.07966) paper on the Hugging Face Hub, improving discoverability.
- Adds a prominent link to the SpaCE-10 paper and its associated GitHub repository (https://github.com/Cuzyoung/SpaCE-10) at the top of the model card content. This provides immediate context about the model's evaluation benchmark.
These changes help researchers and users quickly understand the model's origin and the context of its performance on the SpaCE-10 benchmark.
@@ -1,8 +1,14 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
-
|
|
|
|
|
4 |
language:
|
5 |
- en
|
|
|
|
|
|
|
|
|
6 |
tags:
|
7 |
- gguf
|
8 |
- remyx
|
@@ -16,14 +22,9 @@ tags:
|
|
16 |
- vision-language
|
17 |
- distance-estimation
|
18 |
- quantitative-spatial-reasoning
|
|
|
|
|
19 |
pretty_name: SpaceOm-GGUF
|
20 |
-
license: apache-2.0
|
21 |
-
datasets:
|
22 |
-
- remyxai/SpaceThinker
|
23 |
-
base_model:
|
24 |
-
- remyxai/SpaceOm
|
25 |
-
pipeline_tag: image-text-to-text
|
26 |
-
library_name: llama.cpp
|
27 |
model-index:
|
28 |
- name: SpaceOm
|
29 |
results:
|
@@ -35,230 +36,51 @@ model-index:
|
|
35 |
type: benchmark
|
36 |
metrics:
|
37 |
- type: success_rate
|
38 |
-
name: Overall Success Rate
|
39 |
value: 0.5419
|
40 |
-
results_by_subcategory:
|
41 |
-
- name: 3D Positional Relation / Orientation
|
42 |
-
success_rate: 0.4877
|
43 |
-
- name: Object Localization / 3D Localization
|
44 |
-
success_rate: 0.6337
|
45 |
-
- name: Object Properties / Size
|
46 |
-
success_rate: 0.5043
|
47 |
-
- task:
|
48 |
-
type: visual-question-answering
|
49 |
-
name: Spatial Reasoning
|
50 |
-
dataset:
|
51 |
-
name: BLINK
|
52 |
-
type: benchmark
|
53 |
-
metrics:
|
54 |
-
- type: success_rate
|
55 |
name: Overall Success Rate
|
56 |
-
value: 0.599
|
57 |
-
results_by_subcategory:
|
58 |
-
- name: 3D Positional Relation / Orientation
|
59 |
-
success_rate: 0.7972
|
60 |
-
- name: Counting / Object Counting
|
61 |
-
success_rate: 0.6167
|
62 |
-
- name: Depth and Distance / Relative
|
63 |
-
success_rate: 0.621
|
64 |
-
- name: Object Localization / 2D Localization
|
65 |
-
success_rate: 0.582
|
66 |
-
- name: Point and Object Tracking / Point Correspondence
|
67 |
-
success_rate: 0.3779
|
68 |
-
- task:
|
69 |
-
type: visual-question-answering
|
70 |
-
name: Spatial Reasoning
|
71 |
-
dataset:
|
72 |
-
name: MMIU
|
73 |
-
type: benchmark
|
74 |
-
metrics:
|
75 |
- type: success_rate
|
|
|
76 |
name: Overall Success Rate
|
77 |
-
value: 0.388
|
78 |
-
results_by_subcategory:
|
79 |
-
- name: Camera and Image Transformation / 2D Transformation
|
80 |
-
success_rate: 0.255
|
81 |
-
- name: Camera and Image Transformation / 3D Camera Pose
|
82 |
-
success_rate: 0.4
|
83 |
-
- name: Camera and Image Transformation / Camera Motion
|
84 |
-
success_rate: 0.4436
|
85 |
-
- name: Depth and Distance / Absolute
|
86 |
-
success_rate: 0.265
|
87 |
-
- name: Object Localization / 3D Localization
|
88 |
-
success_rate: 0.3625
|
89 |
-
- name: Point and Object Tracking / 3D Tracking
|
90 |
-
success_rate: 0.725
|
91 |
-
- name: Point and Object Tracking / Point Correspondence
|
92 |
-
success_rate: 0.265
|
93 |
-
- task:
|
94 |
-
type: visual-question-answering
|
95 |
-
name: Spatial Reasoning
|
96 |
-
dataset:
|
97 |
-
name: MMVP
|
98 |
-
type: benchmark
|
99 |
-
metrics:
|
100 |
- type: success_rate
|
|
|
101 |
name: Overall Success Rate
|
102 |
-
value: 0.5833
|
103 |
-
results_by_subcategory:
|
104 |
-
- name: Others / Miscellaneous
|
105 |
-
success_rate: 0.5833
|
106 |
-
- task:
|
107 |
-
type: visual-question-answering
|
108 |
-
name: Spatial Reasoning
|
109 |
-
dataset:
|
110 |
-
name: QSpatialBench-Plus
|
111 |
-
type: benchmark
|
112 |
-
metrics:
|
113 |
- type: success_rate
|
|
|
114 |
name: Overall Success Rate
|
115 |
-
value: 0.4455
|
116 |
-
results_by_subcategory:
|
117 |
-
- name: Depth and Distance / Absolute
|
118 |
-
success_rate: 0.4455
|
119 |
-
- task:
|
120 |
-
type: visual-question-answering
|
121 |
-
name: Spatial Reasoning
|
122 |
-
dataset:
|
123 |
-
name: QSpatialBench-ScanNet
|
124 |
-
type: benchmark
|
125 |
-
metrics:
|
126 |
- type: success_rate
|
|
|
127 |
name: Overall Success Rate
|
128 |
-
value: 0.4876
|
129 |
-
results_by_subcategory:
|
130 |
-
- name: Depth and Distance / Absolute
|
131 |
-
success_rate: 0.464
|
132 |
-
- name: Object Properties / Size
|
133 |
-
success_rate: 0.5111
|
134 |
-
- task:
|
135 |
-
type: visual-question-answering
|
136 |
-
name: Spatial Reasoning
|
137 |
-
dataset:
|
138 |
-
name: RealWorldQA
|
139 |
-
type: benchmark
|
140 |
-
metrics:
|
141 |
- type: success_rate
|
|
|
142 |
name: Overall Success Rate
|
143 |
-
value: 0.6105
|
144 |
-
results_by_subcategory:
|
145 |
-
- name: Others / Miscellaneous
|
146 |
-
success_rate: 0.6105
|
147 |
-
- task:
|
148 |
-
type: visual-question-answering
|
149 |
-
name: Spatial Reasoning
|
150 |
-
dataset:
|
151 |
-
name: SpatialSense
|
152 |
-
type: benchmark
|
153 |
-
metrics:
|
154 |
- type: success_rate
|
|
|
155 |
name: Overall Success Rate
|
156 |
-
value: 0.7043
|
157 |
-
results_by_subcategory:
|
158 |
-
- name: 3D Positional Relation / Orientation
|
159 |
-
success_rate: 0.7043
|
160 |
-
- task:
|
161 |
-
type: visual-question-answering
|
162 |
-
name: Spatial Reasoning
|
163 |
-
dataset:
|
164 |
-
name: VGBench
|
165 |
-
type: benchmark
|
166 |
-
metrics:
|
167 |
- type: success_rate
|
|
|
168 |
name: Overall Success Rate
|
169 |
-
value: 0.3504
|
170 |
-
results_by_subcategory:
|
171 |
-
- name: Camera and Image Transformation / 2D Transformation
|
172 |
-
success_rate: 0.2568
|
173 |
-
- name: Camera and Image Transformation / 3D Camera Pose
|
174 |
-
success_rate: 0.4371
|
175 |
-
- name: Depth and Distance / Absolute
|
176 |
-
success_rate: 0.3339
|
177 |
-
- name: Depth and Distance / Relative
|
178 |
-
success_rate: 0.32
|
179 |
-
- name: Object Localization / 3D Localization
|
180 |
-
success_rate: 0.4283
|
181 |
-
- name: Point and Object Tracking / 3D Tracking
|
182 |
-
success_rate: 0.3264
|
183 |
-
- task:
|
184 |
-
type: visual-question-answering
|
185 |
-
name: Spatial Reasoning
|
186 |
-
dataset:
|
187 |
-
name: VSI-Bench_8
|
188 |
-
type: benchmark
|
189 |
-
metrics:
|
190 |
- type: success_rate
|
|
|
191 |
name: Overall Success Rate
|
192 |
-
value: 0.2558
|
193 |
-
results_by_subcategory:
|
194 |
-
- name: 3D Positional Relation / Orientation
|
195 |
-
success_rate: 0.3998
|
196 |
-
- name: Counting / Object Counting
|
197 |
-
success_rate: 0.229
|
198 |
-
- name: Depth and Distance / Absolute
|
199 |
-
success_rate: 0.1562
|
200 |
-
- name: Depth and Distance / Relative
|
201 |
-
success_rate: 0.3648
|
202 |
-
- name: Object Properties / Size
|
203 |
-
success_rate: 0.1645
|
204 |
-
- name: Others / Miscellaneous
|
205 |
-
success_rate: 0.2204
|
206 |
-
- task:
|
207 |
-
type: visual-question-answering
|
208 |
-
name: Spatial Reasoning
|
209 |
-
dataset:
|
210 |
-
name: VSR-ZeroShot
|
211 |
-
type: benchmark
|
212 |
-
metrics:
|
213 |
- type: success_rate
|
|
|
214 |
name: Overall Success Rate
|
215 |
-
value: 0.8085
|
216 |
-
results_by_subcategory:
|
217 |
-
- name: 3D Positional Relation / Orientation
|
218 |
-
success_rate: 0.8085
|
219 |
-
- task:
|
220 |
-
type: visual-question-answering
|
221 |
-
name: Spatial Reasoning
|
222 |
-
dataset:
|
223 |
-
name: cvbench
|
224 |
-
type: benchmark
|
225 |
-
metrics:
|
226 |
- type: success_rate
|
|
|
227 |
name: Overall Success Rate
|
228 |
-
value: 0.6839
|
229 |
-
results_by_subcategory:
|
230 |
-
- name: Counting / Object Counting
|
231 |
-
success_rate: 0.6294
|
232 |
-
- name: Depth and Distance / Relative
|
233 |
-
success_rate: 0.7408
|
234 |
-
- name: Object Localization / 3D Localization
|
235 |
-
success_rate: 0.6815
|
236 |
-
- task:
|
237 |
-
type: visual-question-answering
|
238 |
-
name: Spatial Reasoning
|
239 |
-
dataset:
|
240 |
-
name: spatialbench
|
241 |
-
type: benchmark
|
242 |
-
metrics:
|
243 |
- type: success_rate
|
|
|
244 |
name: Overall Success Rate
|
|
|
245 |
value: 0.6553
|
246 |
-
|
247 |
-
- name: 3D Positional Relation / Orientation
|
248 |
-
success_rate: 0.6765
|
249 |
-
- name: Counting / Object Counting
|
250 |
-
success_rate: 0.75
|
251 |
-
- name: Object Properties / Existence
|
252 |
-
success_rate: 0.925
|
253 |
-
- name: Object Properties / Reachability
|
254 |
-
success_rate: 0.55
|
255 |
-
- name: Object Properties / Size
|
256 |
-
success_rate: 0.375
|
257 |
-
|
258 |
---
|
259 |
|
260 |
# SpaceOm
|
261 |
|
|
|
|
|
|
|
262 |
**Model creator:** [remyxai](https://huggingface.co/remyxai)<br>
|
263 |
**Original model**: [SpaceOm](https://huggingface.co/remyxai/SpaceOm)<br>
|
264 |
**GGUF quantization:** `llama.cpp` commit [2baf07727f921d9a4a1b63a2eff941e95d0488ed](https://github.com/ggerganov/llama.cpp/tree/2baf07727f921d9a4a1b63a2eff941e95d0488ed)<br>
|
|
|
1 |
---
|
2 |
+
base_model:
|
3 |
+
- remyxai/SpaceOm
|
4 |
+
datasets:
|
5 |
+
- remyxai/SpaceThinker
|
6 |
language:
|
7 |
- en
|
8 |
+
library_name: llama.cpp
|
9 |
+
license: apache-2.0
|
10 |
+
pipeline_tag: image-text-to-text
|
11 |
+
paper: 2506.07966
|
12 |
tags:
|
13 |
- gguf
|
14 |
- remyx
|
|
|
22 |
- vision-language
|
23 |
- distance-estimation
|
24 |
- quantitative-spatial-reasoning
|
25 |
+
task_categories:
|
26 |
+
- visual-question-answering
|
27 |
pretty_name: SpaceOm-GGUF
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
model-index:
|
29 |
- name: SpaceOm
|
30 |
results:
|
|
|
36 |
type: benchmark
|
37 |
metrics:
|
38 |
- type: success_rate
|
|
|
39 |
value: 0.5419
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
name: Overall Success Rate
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
- type: success_rate
|
42 |
+
value: 0.599
|
43 |
name: Overall Success Rate
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
- type: success_rate
|
45 |
+
value: 0.388
|
46 |
name: Overall Success Rate
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
- type: success_rate
|
48 |
+
value: 0.5833
|
49 |
name: Overall Success Rate
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
- type: success_rate
|
51 |
+
value: 0.4455
|
52 |
name: Overall Success Rate
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
- type: success_rate
|
54 |
+
value: 0.4876
|
55 |
name: Overall Success Rate
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
- type: success_rate
|
57 |
+
value: 0.6105
|
58 |
name: Overall Success Rate
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
59 |
- type: success_rate
|
60 |
+
value: 0.7043
|
61 |
name: Overall Success Rate
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
62 |
- type: success_rate
|
63 |
+
value: 0.3504
|
64 |
name: Overall Success Rate
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
- type: success_rate
|
66 |
+
value: 0.2558
|
67 |
name: Overall Success Rate
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
- type: success_rate
|
69 |
+
value: 0.8085
|
70 |
name: Overall Success Rate
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
71 |
- type: success_rate
|
72 |
+
value: 0.6839
|
73 |
name: Overall Success Rate
|
74 |
+
- type: success_rate
|
75 |
value: 0.6553
|
76 |
+
name: Overall Success Rate
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
---
|
78 |
|
79 |
# SpaceOm
|
80 |
|
81 |
+
This model is evaluated in the paper [SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence](https://huggingface.co/papers/2506.07966).
|
82 |
+
The code for the SpaCE-10 benchmark is available at: https://github.com/Cuzyoung/SpaCE-10.
|
83 |
+
|
84 |
**Model creator:** [remyxai](https://huggingface.co/remyxai)<br>
|
85 |
**Original model**: [SpaceOm](https://huggingface.co/remyxai/SpaceOm)<br>
|
86 |
**GGUF quantization:** `llama.cpp` commit [2baf07727f921d9a4a1b63a2eff941e95d0488ed](https://github.com/ggerganov/llama.cpp/tree/2baf07727f921d9a4a1b63a2eff941e95d0488ed)<br>
|