lbourdois commited on
Commit
a659da1
Β·
verified Β·
1 Parent(s): 4a745da

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +110 -96
README.md CHANGED
@@ -1,96 +1,110 @@
1
- ---
2
- library_name: transformers
3
- license: apache-2.0
4
- base_model: Qwen/Qwen2.5-32B-Instruct
5
- tags:
6
- - llama-factory
7
- - full
8
- - generated_from_trainer
9
- model-index:
10
- - name: OpenThinker2-32B
11
- results: []
12
- datasets:
13
- - open-thoughts/OpenThoughts2-1M
14
- ---
15
-
16
- <p align="center">
17
- <img src="https://huggingface.co/datasets/open-thoughts/open-thoughts-114k/resolve/main/open_thoughts.png" width="50%">
18
- </p>
19
-
20
- # OpenThinker2-32B
21
-
22
- This model is a fine-tuned version of [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) on the
23
- [OpenThoughts2-1M](https://huggingface.co/datasets/open-thoughts/OpenThoughts2-1M) dataset.
24
-
25
- The [OpenThinker2-32B](https://huggingface.co/open-thoughts/OpenThinker2-32B) model is the highest performing open-data model.
26
- This model improves upon our previous [OpenThinker-32B](https://huggingface.co/open-thoughts/OpenThinker-32B) model, which was trained on 114k examples from [OpenThoughts-114k](https://huggingface.co/datasets/open-thoughts/open-thoughts-114k).
27
- The numbers reported in the table below are evaluated with our open-source tool [Evalchemy](https://github.com/mlfoundations/Evalchemy).
28
-
29
- | Model | Data | AIME24 | AIME25 | AMC23 | MATH500 | GPQA-D | LCBv2 |
30
- | ----------------------------------------------------------------------------------------------- | ---- | ------ | ------ | ----- | ------- | ------ | ----- |
31
- | [OpenThinker2-32B](https://huggingface.co/open-thoughts/OpenThinker2-32B) | βœ… | 76.7 | 58.7 | 94.0 | 90.8 | 64.1 | 72.5 |
32
- | [OpenThinker-32B](https://huggingface.co/open-thoughts/OpenThinker-32B) | βœ… | 68.0 | 49.3 | 95.5 | 90.6 | 63.5 | 68.6 |
33
- | [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | ❌ | 74.7 | 50.0 | 96.5 | 90.0 | 65.8 | 72.3 |
34
- | [Light-R1-32B](https://huggingface.co/qihoo360/Light-R1-32B) | βœ… | 74.7 | 58.0 | 96.0 | 90.4 | 62.0 | 56.0 |
35
- | [S1.1-32B](https://huggingface.co/simplescaling/s1.1-32B) | βœ… | 59.3 | 42.7 | 91.5 | 87.4 | 62.0 | 58.7 |
36
-
37
-
38
- ## Data
39
-
40
- This model was trained on the [OpenThoughts2-1M](https://huggingface.co/datasets/open-thoughts/OpenThoughts2-1M) dataset.
41
-
42
- The [OpenThoughts2-1M](https://huggingface.co/datasets/open-thoughts/OpenThoughts2-1M) dataset was constructed by augmenting [OpenThoughts-114k](https://huggingface.co/datasets/open-thoughts/open-thoughts-114k) with existing datasets like [OpenR1](https://huggingface.co/open-r1), as well as additional math and code reasoning data.
43
- We generate the additional math and code data by ablating over 26 different question generation methodologies and sampling from the highest performing ones.
44
-
45
- See the [OpenThoughts2-1M](https://huggingface.co/datasets/open-thoughts/OpenThoughts2-1M) dataset page or our [blog post](https://www.open-thoughts.ai/blog/thinkagain) for additional information.
46
-
47
-
48
- ## Intended uses & limitations
49
-
50
- Apache 2.0 License
51
-
52
-
53
- ## Training procedure
54
-
55
- We used 128 4xA100 nodes to train the model for 50 hours.
56
-
57
- ### Training hyperparameters
58
-
59
- The following hyperparameters were used during training:
60
- - learning_rate: 8e-05
61
- - seed: 42
62
- - distributed_type: multi-GPU
63
- - num_devices: 512
64
- - gradient_accumulation_steps: 1
65
- - total_train_batch_size: 512
66
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
67
- - lr_scheduler_type: cosine
68
- - lr_scheduler_warmup_ratio: 0.1
69
- - num_epochs: 5.0
70
-
71
- ### Framework versions
72
-
73
- - Transformers 4.46.1
74
- - Pytorch 2.3.0
75
- - Datasets 3.1.0
76
- - Tokenizers 0.20.3
77
-
78
- More info can be found in our repository: [https://github.com/open-thoughts/open-thoughts](https://github.com/open-thoughts/open-thoughts).
79
-
80
- # Citation
81
- ```
82
- @misc{openthoughts,
83
- author = {Team, OpenThoughts},
84
- month = jan,
85
- title = {{Open Thoughts}},
86
- howpublished = {https://open-thoughts.ai},
87
- year = {2025}
88
- }
89
- ```
90
-
91
- # Links
92
- - πŸ“Š [OpenThoughts2 and OpenThinker2 Blog Post](https://www.open-thoughts.ai/blog/thinkagain)
93
- - πŸ’» [Open Thoughts GitHub Repository](https://github.com/open-thoughts/open-thoughts)
94
- - 🧠 [OpenThoughts2-1M dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts2-1M)
95
- - πŸ€– [OpenThinker2-7B model](https://huggingface.co/open-thoughts/OpenThinker2-7B)
96
- - πŸ€– [OpenThinker2-32B model](https://huggingface.co/open-thoughts/OpenThinker2-32B) - this model.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen2.5-32B-Instruct
5
+ tags:
6
+ - llama-factory
7
+ - full
8
+ - generated_from_trainer
9
+ datasets:
10
+ - open-thoughts/OpenThoughts2-1M
11
+ language:
12
+ - zho
13
+ - eng
14
+ - fra
15
+ - spa
16
+ - por
17
+ - deu
18
+ - ita
19
+ - rus
20
+ - jpn
21
+ - kor
22
+ - vie
23
+ - tha
24
+ - ara
25
+ model-index:
26
+ - name: OpenThinker2-32B
27
+ results: []
28
+ ---
29
+
30
+ <p align="center">
31
+ <img src="https://huggingface.co/datasets/open-thoughts/open-thoughts-114k/resolve/main/open_thoughts.png" width="50%">
32
+ </p>
33
+
34
+ # OpenThinker2-32B
35
+
36
+ This model is a fine-tuned version of [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) on the
37
+ [OpenThoughts2-1M](https://huggingface.co/datasets/open-thoughts/OpenThoughts2-1M) dataset.
38
+
39
+ The [OpenThinker2-32B](https://huggingface.co/open-thoughts/OpenThinker2-32B) model is the highest performing open-data model.
40
+ This model improves upon our previous [OpenThinker-32B](https://huggingface.co/open-thoughts/OpenThinker-32B) model, which was trained on 114k examples from [OpenThoughts-114k](https://huggingface.co/datasets/open-thoughts/open-thoughts-114k).
41
+ The numbers reported in the table below are evaluated with our open-source tool [Evalchemy](https://github.com/mlfoundations/Evalchemy).
42
+
43
+ | Model | Data | AIME24 | AIME25 | AMC23 | MATH500 | GPQA-D | LCBv2 |
44
+ | ----------------------------------------------------------------------------------------------- | ---- | ------ | ------ | ----- | ------- | ------ | ----- |
45
+ | [OpenThinker2-32B](https://huggingface.co/open-thoughts/OpenThinker2-32B) | βœ… | 76.7 | 58.7 | 94.0 | 90.8 | 64.1 | 72.5 |
46
+ | [OpenThinker-32B](https://huggingface.co/open-thoughts/OpenThinker-32B) | βœ… | 68.0 | 49.3 | 95.5 | 90.6 | 63.5 | 68.6 |
47
+ | [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | ❌ | 74.7 | 50.0 | 96.5 | 90.0 | 65.8 | 72.3 |
48
+ | [Light-R1-32B](https://huggingface.co/qihoo360/Light-R1-32B) | βœ… | 74.7 | 58.0 | 96.0 | 90.4 | 62.0 | 56.0 |
49
+ | [S1.1-32B](https://huggingface.co/simplescaling/s1.1-32B) | βœ… | 59.3 | 42.7 | 91.5 | 87.4 | 62.0 | 58.7 |
50
+
51
+
52
+ ## Data
53
+
54
+ This model was trained on the [OpenThoughts2-1M](https://huggingface.co/datasets/open-thoughts/OpenThoughts2-1M) dataset.
55
+
56
+ The [OpenThoughts2-1M](https://huggingface.co/datasets/open-thoughts/OpenThoughts2-1M) dataset was constructed by augmenting [OpenThoughts-114k](https://huggingface.co/datasets/open-thoughts/open-thoughts-114k) with existing datasets like [OpenR1](https://huggingface.co/open-r1), as well as additional math and code reasoning data.
57
+ We generate the additional math and code data by ablating over 26 different question generation methodologies and sampling from the highest performing ones.
58
+
59
+ See the [OpenThoughts2-1M](https://huggingface.co/datasets/open-thoughts/OpenThoughts2-1M) dataset page or our [blog post](https://www.open-thoughts.ai/blog/thinkagain) for additional information.
60
+
61
+
62
+ ## Intended uses & limitations
63
+
64
+ Apache 2.0 License
65
+
66
+
67
+ ## Training procedure
68
+
69
+ We used 128 4xA100 nodes to train the model for 50 hours.
70
+
71
+ ### Training hyperparameters
72
+
73
+ The following hyperparameters were used during training:
74
+ - learning_rate: 8e-05
75
+ - seed: 42
76
+ - distributed_type: multi-GPU
77
+ - num_devices: 512
78
+ - gradient_accumulation_steps: 1
79
+ - total_train_batch_size: 512
80
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
81
+ - lr_scheduler_type: cosine
82
+ - lr_scheduler_warmup_ratio: 0.1
83
+ - num_epochs: 5.0
84
+
85
+ ### Framework versions
86
+
87
+ - Transformers 4.46.1
88
+ - Pytorch 2.3.0
89
+ - Datasets 3.1.0
90
+ - Tokenizers 0.20.3
91
+
92
+ More info can be found in our repository: [https://github.com/open-thoughts/open-thoughts](https://github.com/open-thoughts/open-thoughts).
93
+
94
+ # Citation
95
+ ```
96
+ @misc{openthoughts,
97
+ author = {Team, OpenThoughts},
98
+ month = jan,
99
+ title = {{Open Thoughts}},
100
+ howpublished = {https://open-thoughts.ai},
101
+ year = {2025}
102
+ }
103
+ ```
104
+
105
+ # Links
106
+ - πŸ“Š [OpenThoughts2 and OpenThinker2 Blog Post](https://www.open-thoughts.ai/blog/thinkagain)
107
+ - πŸ’» [Open Thoughts GitHub Repository](https://github.com/open-thoughts/open-thoughts)
108
+ - 🧠 [OpenThoughts2-1M dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts2-1M)
109
+ - πŸ€– [OpenThinker2-7B model](https://huggingface.co/open-thoughts/OpenThinker2-7B)
110
+ - πŸ€– [OpenThinker2-32B model](https://huggingface.co/open-thoughts/OpenThinker2-32B) - this model.