nielsr HF Staff commited on
Commit
0c32537
Β·
verified Β·
1 Parent(s): 99ab88d

Add missing metadata to model card

Browse files

This PR adds missing metadata to the model card, including the `pipeline_tag`, `library_name`, and `license`. This improves discoverability and clarity for users.

Files changed (1) hide show
  1. README.md +196 -0
README.md ADDED
@@ -0,0 +1,196 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-text-to-text
4
+ library_name: transformers
5
+ ---
6
+
7
+ <div align="center">
8
+ <img alt="MM-Eureka logo" src="./docs/logo.png" style="height: 200px;" />
9
+ </div>
10
+
11
+
12
+ <div align="center">
13
+
14
+ # MM-EUREKA
15
+
16
+ </div>
17
+
18
+ <div align="center">
19
+ <p align="center">
20
+ πŸ“–<a href="https://github.com/ModalMinds/MM-EUREKA/blob/main/MM_Eureka_paper.pdf">Paper</a> |
21
+ πŸ“Š<a href="https://huggingface.co/datasets/FanqingM/MM-Eureka-Dataset">Datasets</a> |
22
+ πŸ€—<a href="https://huggingface.co/FanqingM/MM-Eureka-8B">MM-Eureka-8B</a> |
23
+ πŸ€—<a href="https://huggingface.co/FanqingM/MM-Eureka-Zero-38B">MM-Eureka-Zero-38B</a>
24
+ </p>
25
+ </div>
26
+
27
+ <hr>
28
+ <div align="center">
29
+ <p style="text-align: center;">MM-EUREKA: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning<p>
30
+ </div>
31
+ <hr>
32
+ <div align="center">
33
+ <a href="https://github.com/ModalMinds/MM-EUREKA/blob/main/MM_Eureka_paper.pdf">[[Paper PDF Link]]</a>
34
+ </div>
35
+
36
+ <div align="center">
37
+ <img alt="Visual Aha Moment" src="./docs/visual_aha_moment.png"/>
38
+ </div>
39
+
40
+
41
+ ## 🎯Overview
42
+
43
+ We present **MM-Eureka** and **MM-Eureka-Zero**, a series of multimodal reasoning models that successfully extend large-scale rule-based reinforcement learning (RL) to multimodal reasoning.
44
+
45
+ While rule-based RL has shown remarkable success in improving LLMs' reasoning abilities in text domains, its application to multimodal settings has remained challenging. Our work reproduces key characteristics of text-based RL systems like DeepSeek-R1 in the multimodal space for the first time, including steady increases in **accuracy reward** and **response length**, and the emergence of **reflection behaviors**.
46
+
47
+ We demonstrate that both instruction-tuned and pre-trained models can develop strong multimodal reasoning capabilities through rule-based RL without supervised fine-tuning, showing superior **data efficiency** compared to alternative approaches.
48
+
49
+ πŸ”₯We open-source our complete pipeline to foster further research in this area. We release all our codes, models, data, etc. at [MM-EUREKA](https://github.com/ModalMinds/MM-EUREKA)
50
+
51
+ ## πŸ—žοΈ News
52
+
53
+ - **[2025/03/07]** We released `MM-Eureka`.
54
+ - πŸ“– Paper: [MM-Eureka-paper](https://github.com/ModalMinds/MM-EUREKA/blob/main/MM_Eureka_paper.pdf)
55
+ - πŸ€— Model: [MM-Eureka-8B](https://huggingface.co/FanqingM/MM-Eureka-8B) & [MM-Eureka-Zero-38B](https://huggingface.co/FanqingM/MM-Eureka-Zero-38B)
56
+ - πŸ“Š Dataset: [MM-Eureka-Dataset](https://huggingface.co/datasets/FanqingM/MM-Eureka-Dataset)
57
+
58
+
59
+
60
+ ## πŸš€ Features
61
+
62
+ This repository is built upon [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), introducing several key enhancements:
63
+
64
+ - **Multimodal RFT Support**: Extends OpenRLHF to incorporate **vision-language models (VLMs)**, currently supporting **InternVL**, enabling multimodal reasoning capabilities.
65
+ - Currently support **RLOO**, **REINFORCE++**, **GRPO** training using Ray.
66
+ - vLLM integration and distributed training.
67
+ - Support hybrid engine (`--colocate_all_models`, `--vllm_enable_sleep`).
68
+ - **Better Rule-based Reward support**: Better training visualization for Rule-based Rewards (i.g. Format Reward, Accuracy Reward, Repetition Penalty)
69
+ - **Online Filtering**: Filtering out experiences based on Accuracy Reward during training as in [PRIME](https://github.com/PRIME-RL/PRIME)
70
+ - Use `--enable_accuracy_filter`, `--freezing_filter_steps`, `--accuracy_lower_bound`, `--accuracy_upper_bound` to control the behavior of online accuracy filter.
71
+ - Online Accuracy filter is not currently enabled in our default settings, refer to the Disccusion Section in our [paper](https://github.com/ModalMinds/MM-EUREKA/blob/main/MM_Eureka_paper.pdf) for more details.
72
+
73
+
74
+ ## πŸ€– Models
75
+
76
+ <div align="center">
77
+ <img alt="Training Log" src="./docs/training_log.png"/>
78
+ </div>
79
+ *Figure 1 | Train Time Scale-up on Accuracy Reward and Response Length of Rule-Based RL. (a) represents the training scenario on InternVL2.5-instruct-8B, while (b) corresponds to the training scenario on InternVL2.5-pretrained-38B. It can be observed that stable improvements in accuracy reward and response length can be achieved regardless of whether the model is based on an instruct model or a pretrained model.*
80
+
81
+ - πŸ€— [MM-Eureka-8B](https://huggingface.co/FanqingM/MM-Eureka-8B)
82
+
83
+ - πŸ€— [MM-Eureka-Zero-38B](https://huggingface.co/FanqingM/MM-Eureka-Zero-38B)
84
+
85
+
86
+ ## 🏁 Getting Started
87
+
88
+ ### πŸ“¦ Installation
89
+
90
+ ```shell
91
+ git clone https://github.com/ModalMinds/MM-EUREKA.git
92
+ cd MM-EUREKA
93
+ pip install -e .[vllm]
94
+
95
+ # install flash-attn==2.3.6:
96
+
97
+ pip install flash-attn==2.3.6 --no-build-isolation
98
+
99
+ # Alternatively you can compile from source:
100
+
101
+ git clone https://github.com/Dao-AILab/flash-attention.git
102
+ cd flash-attention
103
+ git checkout v2.3.6
104
+ python setup.py install
105
+ ```
106
+
107
+ ### πŸ“‚ Data Preparation
108
+
109
+ You can download our training data from [MM-Eureka-Dataset](https://huggingface.co/datasets/FanqingM/MM-Eureka-Dataset)
110
+
111
+ Once downloaded, refer to the section below for additional data formation. You may need to update the `image_urls` field to reference your local image paths for proper processing.
112
+
113
+ #### Custom dataset
114
+
115
+ For custom dataset, format your data in to a JSONL file, where each entry is a dictionary organized in the following format.
116
+
117
+ ```json
118
+ {
119
+ "id": "0",
120
+ "conversations": [
121
+ {
122
+ "role": "system",
123
+ "content": "system_prompt"
124
+ },
125
+ {
126
+ "role": "user",
127
+ "content": "user_prompt"
128
+ }
129
+ ],
130
+ "answer": "gt that could be parsed and verified by math_verify",
131
+ "image_urls": ["file:///path/to/image1", "file:///path/to/image2"]
132
+ }
133
+ ```
134
+
135
+ > [!NOTE]
136
+ > For text-only inputs, we follow InternVL's official approach, which requires a dummy image input.
137
+ > Specifically, you should provide a (224, 224) pure white image as a placeholder.
138
+ > We have already provided such a blank image at: `examples/blank.png`
139
+
140
+ ### 🌐 Start Training
141
+
142
+ Before starting your own training, ensure that the paths in the provided training scripts are correctly set and that environment variables like `$MASTER_ADDR` and `$NODE_RANK` are properly configured.
143
+
144
+ **start MM-Eureka-8B training**
145
+
146
+ - for single node
147
+
148
+ ```shell
149
+ sh examples/scripts/train_mm_eureka_8b_single_node.sh
150
+ ```
151
+
152
+ - for multiple node
153
+
154
+ ```shell
155
+ sh examples/scripts/train_mm_eureka_8b_multi_node.sh
156
+ ```
157
+
158
+ **start MM-Eureka-Zero-38B training**
159
+
160
+ ```shell
161
+ sh examples/scripts/train_mm_eureka_zero_38b_multi_node.sh
162
+ ```
163
+
164
+
165
+
166
+ ## ⭐ Starchart
167
+
168
+ [![Star History Chart](https://api.star-history.com/svg?repos=ModalMinds/MM-EUREKA&type=Date)](https://star-history.com/#ModalMinds/MM-EUREKA&Date)
169
+
170
+ ## 🀝 Contribution
171
+
172
+ MM-Eureka is stil under active development, if you want to contribute, please feel free to make a pull request or create an issue.
173
+
174
+ Please refer to `CONTRIBUTING.md` before you dive in!
175
+
176
+ ## πŸ“¬ Contact
177
+
178
+ If you have any questions or would like to engage with our community, feel free to scan the QR code below to join our WeChat group.
179
+
180
+ <div align="center">
181
+ <img alt="MM-Eureka logo" src="https://github.com/user-attachments/assets/a04ebfef-9ac4-44ae-a07b-48586794903a" style="height: 400px;" />
182
+ </div>
183
+
184
+ ## πŸŽ“ Acknowledgements
185
+
186
+ We acknowledge the outstanding open-source contributions from [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [LMM-R1](https://github.com/TideDra/lmm-r1) and [vLLM](https://github.com/vllm-project/vllm). We also extend our gratitude to [DeepSeek-R1](https://github.com/deepseek-ai/DeepSeek-R1) and [InternVL](https://github.com/OpenGVLab/InternVL) for their open-source techniques and base models, which have enabled us to further our exploration.
187
+
188
+ ## πŸ“œ Citation
189
+ ```
190
+ @misc{MM-EUREKA2025,
191
+ title={MM-EUREKA: Exploring Visual Aha Moment with Rule-Based Large-Scale Reinforcement Learning},
192
+ author={Fanqing Meng and Lingxiao Du and Zongkai Liu and Zhixiang Zhou and Quanfeng Lu and Daocheng Fu and Botian Shi and Wenhai Wang and Junjun He and Kaipeng Zhang and Ping Luo and Yu Qiao and Qiaosheng Zhang and Wenqi Shao},
193
+ year={2025},
194
+ howpublished={\url{https://github.com/ModalMinds/MM-EUREKA}},
195
+ }
196
+ ```