kazukifujii commited on
Commit
1eca664
·
verified ·
1 Parent(s): 633672e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -9
README.md CHANGED
@@ -8,6 +8,7 @@ language:
8
  base_model:
9
  - meta-llama/Llama-3.1-8B
10
  ---
 
11
  # Model Card
12
 
13
  <img src="https://huggingface.co/datasets/tokyotech-llm/swallow-math/resolve/main/figures/swallow-code-math-log.png" alt="SwallowCodeMath Icon" width="600">
@@ -16,10 +17,10 @@ base_model:
16
 
17
  ## Model Summary
18
 
19
- This model is a continual pre-training of [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) on a mix of mathematical datasets from finemath-4+ and multilingual text datasets.
20
- The model was trained to evaluate the performance of mathematical reasoning and problem-solving as part of the SwallowMath ablation experiments (experiment 1).
21
 
22
- It was trained on **50 billion tokens** using a mix of 4.8% Finemath-4+, 13.1% Code, and 82% multilingual text, following the setup described in the [SwallowMath paper](https://arxiv.org/abs/XXXX.XXXXX).
23
  Training was performed using [Megatron-LM](https://github.com/NVIDIA/Megatron-LM/tree/core_r0.9.0).
24
 
25
  ## Use
@@ -29,10 +30,13 @@ Training was performed using [Megatron-LM](https://github.com/NVIDIA/Megatron-LM
29
  ```python
30
  # pip install -q transformers
31
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
32
  model = "tokyotech-llm/<model-name>"
33
  device = "cuda" # for GPU usage or "cpu" for CPU usage
 
34
  tokenizer = AutoTokenizer.from_pretrained(model)
35
  model = AutoModelForCausalLM.from_pretrained(model).to(device)
 
36
  inputs = tokenizer.encode("Solve the equation 2x + 3 = 7:", return_tensors="pt").to(device)
37
  outputs = model.generate(inputs, max_length=100)
38
  print(tokenizer.decode(outputs[0]))
@@ -77,18 +81,24 @@ Details are in the paper’s Appendix.
77
  - Megatron-LM (version core_r0.9.0) for training
78
  - lm-evaluation-harness for evaluation
79
  - BigCodeBench for code evaluation
 
80
  ## Evaluation
 
81
  The model was evaluated using the setup described in the SwallowMath paper, with the lm-evaluation-harness and BigCodeBench. Benchmarks include mathematical reasoning (GSM8K, MATH), code generation (HumanEval), and general tasks (OpenBookQA, TriviaQA, HellaSwag, SQuAD 2.0, XWINO, MMLU, BBH).
82
  Results are reported for checkpoints at 10B, 20B, 30B, 40B, and 50B tokens.
83
- ### Evaluation Results (Finemath-4+ experiment 1)
 
 
84
  | Tokens (B) | OpenBookQA | TriviaQA | HellaSwag | SQuAD2.0 | XWINO | MMLU | HumanEval | GSM8K | BBH | MATH |
85
  |------------|------------|----------|-----------|----------|-------|------|-----------|-------|-----|------|
86
- | 10 | 0.3700 | 0.6626 | 0.5990 | 0.3350 | 0.8985 | 0.6243 | 0.3439 | 0.4685 | 0.6057 | 0.1760 |
87
- | 20 | 0.3720 | 0.6536 | 0.5963 | 0.3510 | 0.9032 | 0.6261 | 0.3622 | 0.5011 | 0.5896 | 0.2080 |
88
- | 30 | 0.3700 | 0.6574 | 0.5999 | 0.3506 | 0.8998 | 0.6253 | 0.3561 | 0.5019 | 0.5971 | 0.2260 |
89
- | 40 | 0.3720 | 0.6577 | 0.6024 | 0.3499 | 0.9049 | 0.6312 | 0.3701 | 0.5231 | 0.6054 | 0.2260 |
90
- | 50 | 0.3740 | 0.6608 | 0.6001 | 0.3550 | 0.9058 | 0.6329 | 0.3561 | 0.5292 | 0.6166 | 0.2400 |
 
91
  ## Citation
 
92
  ```bibtex
93
  @misc{fujii2025rewritingpretrainingdata,
94
  title={Rewriting Pre-Training Data: Boosting LLM Performance in Math and Code},
 
8
  base_model:
9
  - meta-llama/Llama-3.1-8B
10
  ---
11
+
12
  # Model Card
13
 
14
  <img src="https://huggingface.co/datasets/tokyotech-llm/swallow-math/resolve/main/figures/swallow-code-math-log.png" alt="SwallowCodeMath Icon" width="600">
 
17
 
18
  ## Model Summary
19
 
20
+ This model is a continual pre-training of [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) on a mix of mathematical datasets from [SwallowMath](https://huggingface.co/datasets/tokyotech-llm/swallow-math) and multilingual text datasets.
21
+ The model was trained to evaluate the performance of mathematical reasoning and problem-solving as part of the SwallowMath ablation experiments (experiment 2).
22
 
23
+ It was trained on **50 billion tokens** using a mix of 4.8% SwallowMath (finemath-4+ rewritten) , 13.1% Code, and 82% multilingual text, following the setup described in the [SwallowMath paper](https://arxiv.org/abs/XXXX.XXXXX).
24
  Training was performed using [Megatron-LM](https://github.com/NVIDIA/Megatron-LM/tree/core_r0.9.0).
25
 
26
  ## Use
 
30
  ```python
31
  # pip install -q transformers
32
  from transformers import AutoModelForCausalLM, AutoTokenizer
33
+
34
  model = "tokyotech-llm/<model-name>"
35
  device = "cuda" # for GPU usage or "cpu" for CPU usage
36
+
37
  tokenizer = AutoTokenizer.from_pretrained(model)
38
  model = AutoModelForCausalLM.from_pretrained(model).to(device)
39
+
40
  inputs = tokenizer.encode("Solve the equation 2x + 3 = 7:", return_tensors="pt").to(device)
41
  outputs = model.generate(inputs, max_length=100)
42
  print(tokenizer.decode(outputs[0]))
 
81
  - Megatron-LM (version core_r0.9.0) for training
82
  - lm-evaluation-harness for evaluation
83
  - BigCodeBench for code evaluation
84
+
85
  ## Evaluation
86
+
87
  The model was evaluated using the setup described in the SwallowMath paper, with the lm-evaluation-harness and BigCodeBench. Benchmarks include mathematical reasoning (GSM8K, MATH), code generation (HumanEval), and general tasks (OpenBookQA, TriviaQA, HellaSwag, SQuAD 2.0, XWINO, MMLU, BBH).
88
  Results are reported for checkpoints at 10B, 20B, 30B, 40B, and 50B tokens.
89
+
90
+ ### Evaluation Results (SwallowMath experiment 2)
91
+
92
  | Tokens (B) | OpenBookQA | TriviaQA | HellaSwag | SQuAD2.0 | XWINO | MMLU | HumanEval | GSM8K | BBH | MATH |
93
  |------------|------------|----------|-----------|----------|-------|------|-----------|-------|-----|------|
94
+ | 10 | 0.3720 | 0.6643 | 0.5970 | 0.3443 | 0.9015 | 0.6343 | 0.3439 | 0.5603 | 0.5535 | 0.2480 |
95
+ | 20 | 0.3800 | 0.6580 | 0.5946 | 0.3428 | 0.8994 | 0.6293 | 0.3762 | 0.6156 | 0.5669 | 0.2860 |
96
+ | 30 | 0.3660 | 0.6618 | 0.5964 | 0.3470 | 0.9011 | 0.6298 | 0.3530 | 0.6262 | 0.6383 | 0.3040 |
97
+ | 40 | 0.3700 | 0.6610 | 0.5973 | 0.3535 | 0.9088 | 0.6358 | 0.3738 | 0.6422 | 0.6237 | 0.3100 |
98
+ | 50 | 0.3800 | 0.6637 | 0.5972 | 0.3537 | 0.9045 | 0.6337 | 0.3683 | 0.6535 | 0.6414 | 0.3160 |
99
+
100
  ## Citation
101
+
102
  ```bibtex
103
  @misc{fujii2025rewritingpretrainingdata,
104
  title={Rewriting Pre-Training Data: Boosting LLM Performance in Math and Code},