XiaoEnn commited on
Commit
3750776
·
verified ·
1 Parent(s): 7823f79

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -27
README.md CHANGED
@@ -59,42 +59,16 @@ We named the model "Herberta" by combining "Herb" and "Roberta" to signify its p
59
  ![Loss](https://cdn-uploads.huggingface.co/production/uploads/6564baaa393bae9c194fc32e/BJ7enbRg13IYAZuxwraPP.png)
60
  ![Perplexity](https://cdn-uploads.huggingface.co/production/uploads/6564baaa393bae9c194fc32e/lOohRMIctPJZKM5yEEcQ2.png)
61
 
62
- <!-- <table>
63
- <tr>
64
- <td align="center"><strong>Accuracy</strong></td>
65
- <td align="center"><strong>Loss</strong></td>
66
- <td align="center"><strong>Perplexity</strong></td>
67
- </tr>
68
- <tr>
69
- <td><img src="https://cdn-uploads.huggingface.co/production/uploads/6564baaa393bae9c194fc32e/RDgI-0Ro2kMiwV853Wkgx.png" alt="Accuracy" width="800"></td>
70
- <td><img src="https://cdn-uploads.huggingface.co/production/uploads/6564baaa393bae9c194fc32e/BJ7enbRg13IYAZuxwraPP.png" alt="Loss" width="800"></td>
71
- <td><img src="https://cdn-uploads.huggingface.co/production/uploads/6564baaa393bae9c194fc32e/lOohRMIctPJZKM5yEEcQ2.png" alt="Perplexity" width="800"></td>
72
- </tr>
73
- </table> -->
74
 
75
  ### Pretraining Configuration
76
 
77
- #### Ancient Books
78
- - Pretraining Strategy: BERT-style MASK (15% tokens masked)
79
- - Sequence Length: 512
80
- - Batch Size: 32
81
- - Learning Rate: `1e-5` with an epoch-based decay (`epoch * 0.1`)
82
- - Tokenization: Sentence-based tokenization with padding for sequences <512 tokens.
83
-
84
- #### Modern Textbooks
85
  - Pretraining Strategy: Dynamic MASK + Warmup + Linear Decay
86
  - Sequence Length: 512
87
  - Batch Size: 16
88
  - Learning Rate: Warmup (10% steps) + Linear Decay (1e-5 initial rate)
89
  - Tokenization: Continuous tokenization (512 tokens) without sentence segmentation.
90
 
91
- #### V4 Mixed Dataset (Ancient + Modern)
92
- - Dataset: Combined 48 modern textbooks + 700 ancient books
93
- - Pretraining Strategy: Dynamic MASK, warmup, and linear decay (1e-5 learning rate).
94
- - Epochs: 20
95
- - Sequence Length: 512
96
- - Batch Size: 16
97
- - Tokenization: Continuous tokenization.
98
 
99
  ---
100
 
 
59
  ![Loss](https://cdn-uploads.huggingface.co/production/uploads/6564baaa393bae9c194fc32e/BJ7enbRg13IYAZuxwraPP.png)
60
  ![Perplexity](https://cdn-uploads.huggingface.co/production/uploads/6564baaa393bae9c194fc32e/lOohRMIctPJZKM5yEEcQ2.png)
61
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
  ### Pretraining Configuration
64
 
65
+ #### Modern Textbooks Version
 
 
 
 
 
 
 
66
  - Pretraining Strategy: Dynamic MASK + Warmup + Linear Decay
67
  - Sequence Length: 512
68
  - Batch Size: 16
69
  - Learning Rate: Warmup (10% steps) + Linear Decay (1e-5 initial rate)
70
  - Tokenization: Continuous tokenization (512 tokens) without sentence segmentation.
71
 
 
 
 
 
 
 
 
72
 
73
  ---
74