Upload README.md
Browse files
README.md
ADDED
@@ -0,0 +1,59 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<div align="center">
|
2 |
+
<h2>GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training</h2>
|
3 |
+
<p align="center">
|
4 |
+
<a href="https://github.com/UniModal4Reasoning/GeoX">💡Github Page</a> •
|
5 |
+
<a href="https://huggingface.co/papers/2412.11863">📃Paper</a> •
|
6 |
+
<a href="https://huggingface.co/datasets/U4R/GeoX-data">🗂Dataset</a> •
|
7 |
+
<a href="https://huggingface.co/U4R/GeoX">🤗Checkpoint •
|
8 |
+
<a href="#-citation"> 📖Citation
|
9 |
+
</p>
|
10 |
+
<br>
|
11 |
+
<!-- <img src="https://huggingface.co/datasets/U4R/GeoX-data/blob/main/teaser.png" height="85%"> -->
|
12 |
+
</div>
|
13 |
+
|
14 |
+
|
15 |
+
## Introduction to GeoX
|
16 |
+
|
17 |
+
**GeoX** is a multi-modal large model designed for automatic geometric problem solving, utilizing three progressive training stages to enhance diagram understanding and reasoning. In this paper, we validate that the **formal vision-language training** paradigm is a simple-yet-effective solution for complex mathematical diagram learning.
|
18 |
+
|
19 |
+
|
20 |
+
|
21 |
+
## Data Preparation for GeoX
|
22 |
+
|
23 |
+
### Step 1. Data for Unimodal Pre-training
|
24 |
+
|
25 |
+
You can download our collected diagram images from [this link](https://huggingface.co/datasets/U4R/GeoX-data/pretrain-data.zip).
|
26 |
+
|
27 |
+
Additionally, we use existing geometric text to build a corpus, which is detailed in [our paper]().
|
28 |
+
|
29 |
+
### Step 2. Data for Geometry-Language Alignment
|
30 |
+
|
31 |
+
|
32 |
+
To train the GS-Former, please prepare the [unified formal annotations](https://huggingface.co/datasets/U4R/GeoX-data/unified_formal_annotations.json) and paired [images](https://huggingface.co/datasets/U4R/GeoX-data/images.zip).
|
33 |
+
|
34 |
+
### Step 3. Data for End-to-End Visual Instruction Tuning
|
35 |
+
|
36 |
+
|
37 |
+
|
38 |
+
|
39 |
+
We use the GeoQA, UniGeo, Geometry3K, and PGPS9K datasets for fine-tuning and evaluation:
|
40 |
+
|
41 |
+
1. **GeoQA**: Follow the instructions [here](https://github.com/chen-judge/GeoQA) to download the `GeoQA` dataset.
|
42 |
+
2. **UniGeo**: Follow the instructions [here](https://github.com/chen-judge/UniGeo) to download the `UniGeo` dataset.
|
43 |
+
3. **Geometry3K and PGPS9K**: Follow the instructions [here](https://github.com/mingliangzhang2018/PGPS) to download the `PGPS9K` datasets. The `Geometry3K` is also provided in this database.
|
44 |
+
|
45 |
+
|
46 |
+
<font color="#dd0000">Note:</font> Due to copyright restrictions, we are currently only providing links for these datasets. Full datasets for tuning and evaluation organized by us will be provided via email. If you need it, please contact us by [email]([email protected]).
|
47 |
+
|
48 |
+
|
49 |
+
For more details, please refer to [our paper]() and [GitHub repository](https://github.com/UniModal4Reasoning/GeoX). If you find our work helpful, please consider starring ⭐ in this repository and citing us:
|
50 |
+
|
51 |
+
```bibtex
|
52 |
+
@article{xia2024geox,
|
53 |
+
title={GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training},
|
54 |
+
author={Xia, Renqiu and Li, Mingsheng and Ye, Hancheng and Wu, Wenjie and Zhou, Hongbin and Yuan, Jiakang and Peng, Tianshuo and Cai, Xinyu and Yan, Xiangchao and Wang, Bin and others},
|
55 |
+
journal={arXiv preprint arXiv:2412.11863},
|
56 |
+
year={2024}
|
57 |
+
}
|
58 |
+
```
|
59 |
+
|