GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
💡Github Page • 📃Paper • 🗂Dataset • 🤗Checkpoint • 📖Citation
Introduction to GeoX
GeoX is a multi-modal large model designed for automatic geometric problem solving, utilizing three progressive training stages to enhance diagram understanding and reasoning. In this paper, we validate that the formal vision-language training paradigm is a simple-yet-effective solution for complex mathematical diagram learning.
Data Preparation for GeoX
Step 1. Data for Unimodal Pre-training
You can download our collected diagram images from this link.
Additionally, we use existing geometric text to build a corpus, which is detailed in our paper.
Step 2. Data for Geometry-Language Alignment
To train the GS-Former, please prepare the unified formal annotations and paired images.
Step 3. Data for End-to-End Visual Instruction Tuning
We use the GeoQA, UniGeo, Geometry3K, and PGPS9K datasets for fine-tuning and evaluation:
- GeoQA: Follow the instructions here to download the
GeoQA
dataset. - UniGeo: Follow the instructions here to download the
UniGeo
dataset. - Geometry3K and PGPS9K: Follow the instructions here to download the
PGPS9K
datasets. TheGeometry3K
is also provided in this database.
Note: Due to copyright restrictions, we are currently only providing links for these datasets. Full datasets for tuning and evaluation organized by us will be provided via email. If you need it, please contact us by email.
For more details, please refer to our paper and GitHub repository. If you find our work helpful, please consider starring ⭐ in this repository and citing us:
@article{xia2024geox,
title={GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training},
author={Xia, Renqiu and Li, Mingsheng and Ye, Hancheng and Wu, Wenjie and Zhou, Hongbin and Yuan, Jiakang and Peng, Tianshuo and Cai, Xinyu and Yan, Xiangchao and Wang, Bin and others},
journal={arXiv preprint arXiv:2412.11863},
year={2024}
}
- Downloads last month
- 2