Update README.md
Browse files
README.md
CHANGED
@@ -14,8 +14,7 @@ This repository provides Japanese ModernBERT trained by [SB Intuitions](https://
|
|
14 |
[ModernBERT](https://arxiv.org/abs/2412.13663) is a new variant of the BERT model that combines local and global attention, allowing it to handle long sequences while maintaining high computational efficiency.
|
15 |
It also incorporates modern architectural improvements, such as [RoPE](https://arxiv.org/abs/2104.09864).
|
16 |
|
17 |
-
Our ModernBERT-Ja-130M is trained on a high-quality Japanese and English
|
18 |
-
|
19 |
|
20 |
|
21 |
## How to Use
|
@@ -24,13 +23,13 @@ Our ModernBERT-Ja-130M is trained on a high-quality Japanese and English corpus,
|
|
24 |
You can use our models directly with the transformers library v4.48.0 or higher:
|
25 |
|
26 |
```bash
|
27 |
-
pip install -U transformers>=4.48.0
|
28 |
```
|
29 |
|
30 |
Additionally, if your GPUs support Flash Attention 2, we recommend using our models with Flash Attention 2.
|
31 |
|
32 |
```
|
33 |
-
pip install flash-attn
|
34 |
```
|
35 |
|
36 |
### Example Usage
|
@@ -54,6 +53,25 @@ for result in results:
|
|
54 |
# {'score': 0.0223388671875, 'token': 52525, 'token_str': '快晴', 'sequence': 'おはようございます、今日の天気は快晴です。'}
|
55 |
```
|
56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
## Model Description
|
58 |
|
59 |
|
@@ -136,23 +154,31 @@ we treated the `validation` set as the `test` set and performed 5-fold cross-val
|
|
136 |
For datasets with predefined `train`, `validation`, and `test` sets, we simply trained and evaluated the model five times with different random seeds and used the model with the best average evaluation score on the `validation` set to measure the final score on the `test` set.
|
137 |
|
138 |
|
|
|
139 |
### Evaluation Results
|
140 |
|
141 |
| Model | #Param. | #Param.<br>w/o Emb. | **Avg.** | [JComQA](https://github.com/yahoojapan/JGLUE)<br>(Acc.) | [RCQA](https://www.cl.ecei.tohoku.ac.jp/rcqa/)<br>(Acc.) | [JCoLA](https://github.com/osekilab/JCoLA)<br>(Acc.) | [JNLI](https://github.com/yahoojapan/JGLUE)<br>(Acc.) | [JSICK](https://github.com/verypluming/JSICK)<br>(Acc.) | [JSNLI](https://nlp.ist.i.kyoto-u.ac.jp/?%E6%97%A5%E6%9C%AC%E8%AA%9ESNLI%28JSNLI%29%E3%83%87%E3%83%BC%E3%82%BF%E3%82%BB%E3%83%83%E3%83%88)<br>(Acc.) | [KU RTE](https://nlp.ist.i.kyoto-u.ac.jp/index.php?Textual+Entailment+%E8%A9%95%E4%BE%A1%E3%83%87%E3%83%BC%E3%82%BF)<br>(Acc.) | [JSTS](https://github.com/yahoojapan/JGLUE)<br>(Spearman's ρ) | [Livedoor](https://www.rondhuit.com/download.html)<br>(Acc.) | [Toxicity](https://llm-jp.nii.ac.jp/llm/2024/08/07/llm-jp-toxicity-dataset.html)<br>(Acc.) | [MARC-ja](https://github.com/yahoojapan/JGLUE)<br>(Acc.) | [WRIME](https://github.com/ids-cv/wrime)<br>(Acc.) |
|
142 |
| ------ | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
|
143 |
-
|
|
|
|
|
|
|
|
|
|
|
|
144 |
| [Tohoku BERT-base v3](https://huggingface.co/tohoku-nlp/bert-base-japanese-v3)| 111M | 86M | 86.74 | 82.82 | 83.65 | 81.50 | 89.68 | 84.96 | 92.32 | 60.56 | 87.31 | 96.91 | 93.15 | 96.13 | 91.91 |
|
145 |
| [LUKE-japanese-base-lite](https://huggingface.co/studio-ousia/luke-japanese-base-lite)| 133M | 107M | 87.15 | 82.95 | 83.53 | 82.39 | 90.36 | 85.26 | 92.78 | 60.89 | 86.68 | 97.12 | 93.48 | 96.30 | 94.05 |
|
146 |
| [Kyoto DeBERTa-v3](https://huggingface.co/ku-nlp/deberta-v3-base-japanese)| 160M | 86M | 88.31 | 87.44 | 84.90 | 84.35 | 91.91 | 86.22 | 93.41 | 63.31 | 88.51 | 97.10 | 92.58 | 96.32 | 93.64 |
|
147 |
| [KoichiYasuoka/modernbert-base-japanese-wikipedia](https://huggingface.co/KoichiYasuoka/modernbert-base-japanese-wikipedia)| 160M | 110M | 82.41 | 62.59 | 81.19 | 76.80 | 84.11 | 82.01 | 90.51 | 60.48 | 81.74 | 97.10 | 90.34 | 94.85 | 87.25 |
|
148 |
| | | | | | | | | | | | | | | | |
|
149 |
-
| [Tohoku BERT-large v2](https://huggingface.co/tohoku-nlp/bert-large-japanese-v2)| 337M | 303M | 88.36 | 86.93 | 84.81 | 82.89 | 92.05 | 85.33 | 93.32 | 64.60 | 89.11 | 97.64 | 94.38 | 96.46 | 92.77 |
|
150 |
| [Tohoku BERT-large char v2](https://huggingface.co/cl-tohoku/bert-large-japanese-char-v2)| 311M | 303M | 87.23 | 85.08 | 84.20 | 81.79 | 90.55 | 85.25 | 92.63 | 61.29 | 87.64 | 96.55 | 93.26 | 96.25 | 92.29 |
|
|
|
151 |
| [Waseda RoBERTa-large (Seq. 512)](https://huggingface.co/nlp-waseda/roberta-large-japanese-seq512-with-auto-jumanpp)| 337M | 303M | 88.37 | 88.81 | 84.50 | 82.34 | 91.37 | 85.49 | 93.97 | 61.53 | 88.95 | 96.99 | 95.06 | 96.38 | 95.09 |
|
152 |
| [Waseda RoBERTa-large (Seq. 128)](https://huggingface.co/nlp-waseda/roberta-large-japanese-with-auto-jumanpp)| 337M | 303M | 88.36 | 89.35 | 83.63 | 84.26 | 91.53 | 85.30 | 94.05 | 62.82 | 88.67 | 95.82 | 93.60 | 96.05 | 95.23 |
|
153 |
-
| [LUKE-japanese-large-lite](https://huggingface.co/studio-ousia/luke-japanese-large-lite)| 414M | 379M |
|
154 |
| [RetrievaBERT](https://huggingface.co/retrieva-jp/bert-1.3b)| 1.30B | 1.15B | 86.79 | 80.55 | 84.35 | 80.67 | 89.86 | 85.24 | 93.46 | 60.48 | 87.30 | 97.04 | 92.70 | 96.18 | 93.61 |
|
155 |
| | | | | | | | | | | | | | | | |
|
|
|
|
|
156 |
| [mBERT](https://huggingface.co/google-bert/bert-base-multilingual-cased)| 178M | 86M | 83.48 | 66.08 | 82.76 | 77.32 | 88.15 | 84.20 | 91.25 | 60.56 | 84.18 | 97.01 | 89.21 | 95.05 | 85.99 |
|
157 |
| [XLM-RoBERTa-base](https://huggingface.co/FacebookAI/xlm-roberta-base)| 278M | 86M | 84.36 | 69.44 | 82.86 | 78.71 | 88.14 | 83.17 | 91.27 | 60.48 | 83.34 | 95.93 | 91.91 | 95.82 | 91.20 |
|
158 |
| [XLM-RoBERTa-large](https://huggingface.co/FacebookAI/xlm-roberta-large)| 560M | 303M | 86.95 | 80.07 | 84.47 | 80.42 | 92.16 | 84.74 | 93.87 | 60.48 | 88.03 | 97.01 | 93.37 | 96.03 | 92.72 |
|
@@ -172,4 +198,18 @@ When you use this model for masked language modeling, it may generate biases or
|
|
172 |
|
173 |
## License
|
174 |
|
175 |
-
[MIT License](https://huggingface.co/sbintuitions/modernbert-ja-130m/blob/main/LICENSE)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
[ModernBERT](https://arxiv.org/abs/2412.13663) is a new variant of the BERT model that combines local and global attention, allowing it to handle long sequences while maintaining high computational efficiency.
|
15 |
It also incorporates modern architectural improvements, such as [RoPE](https://arxiv.org/abs/2104.09864).
|
16 |
|
17 |
+
Our ModernBERT-Ja-130M is trained on a high-quality corpus of Japanese and English text comprising **4.39T tokens**, featuring a vocabulary size of 102,400 and a sequence length of **8,192** tokens.
|
|
|
18 |
|
19 |
|
20 |
## How to Use
|
|
|
23 |
You can use our models directly with the transformers library v4.48.0 or higher:
|
24 |
|
25 |
```bash
|
26 |
+
pip install -U "transformers>=4.48.0"
|
27 |
```
|
28 |
|
29 |
Additionally, if your GPUs support Flash Attention 2, we recommend using our models with Flash Attention 2.
|
30 |
|
31 |
```
|
32 |
+
pip install flash-attn --no-build-isolation
|
33 |
```
|
34 |
|
35 |
### Example Usage
|
|
|
53 |
# {'score': 0.0223388671875, 'token': 52525, 'token_str': '快晴', 'sequence': 'おはようございます、今日の天気は快晴です。'}
|
54 |
```
|
55 |
|
56 |
+
## Model Series
|
57 |
+
|
58 |
+
We provide ModernBERT-Ja in several model sizes. Below is a summary of each model.
|
59 |
+
|
60 |
+
|ID| #Param. | #Param.<br>w/o Emb.|Dim.|Inter. Dim.|#Layers|
|
61 |
+
|-|-|-|-|-|-|
|
62 |
+
|[sbintuitions/modernbert-ja-30m](https://huggingface.co/sbintuitions/modernbert-ja-30m)|37M|10M|256|1024|10|
|
63 |
+
|[sbintuitions/modernbert-ja-70m](https://huggingface.co/sbintuitions/modernbert-ja-70m)|70M|31M|384|1536|13|
|
64 |
+
|[**sbintuitions/modernbert-ja-130m**](https://huggingface.co/sbintuitions/modernbert-ja-130m)|132M|80M|512|2048|19|
|
65 |
+
|[sbintuitions/modernbert-ja-310m](https://huggingface.co/sbintuitions/modernbert-ja-310m)|315M|236M|768|3072|25|
|
66 |
+
|
67 |
+
For all models,
|
68 |
+
the vocabulary size is 102,400,
|
69 |
+
the head dimension is 64,
|
70 |
+
and the activation function is GELU.
|
71 |
+
The configuration for global attention and sliding window attention consists of 1 layer + 2 layers (global–local–local).
|
72 |
+
The sliding window attention window context size is 128, with global_rope_theta set to 160,000 and local_rope_theta set to 10,000.
|
73 |
+
|
74 |
+
|
75 |
## Model Description
|
76 |
|
77 |
|
|
|
154 |
For datasets with predefined `train`, `validation`, and `test` sets, we simply trained and evaluated the model five times with different random seeds and used the model with the best average evaluation score on the `validation` set to measure the final score on the `test` set.
|
155 |
|
156 |
|
157 |
+
|
158 |
### Evaluation Results
|
159 |
|
160 |
| Model | #Param. | #Param.<br>w/o Emb. | **Avg.** | [JComQA](https://github.com/yahoojapan/JGLUE)<br>(Acc.) | [RCQA](https://www.cl.ecei.tohoku.ac.jp/rcqa/)<br>(Acc.) | [JCoLA](https://github.com/osekilab/JCoLA)<br>(Acc.) | [JNLI](https://github.com/yahoojapan/JGLUE)<br>(Acc.) | [JSICK](https://github.com/verypluming/JSICK)<br>(Acc.) | [JSNLI](https://nlp.ist.i.kyoto-u.ac.jp/?%E6%97%A5%E6%9C%AC%E8%AA%9ESNLI%28JSNLI%29%E3%83%87%E3%83%BC%E3%82%BF%E3%82%BB%E3%83%83%E3%83%88)<br>(Acc.) | [KU RTE](https://nlp.ist.i.kyoto-u.ac.jp/index.php?Textual+Entailment+%E8%A9%95%E4%BE%A1%E3%83%87%E3%83%BC%E3%82%BF)<br>(Acc.) | [JSTS](https://github.com/yahoojapan/JGLUE)<br>(Spearman's ρ) | [Livedoor](https://www.rondhuit.com/download.html)<br>(Acc.) | [Toxicity](https://llm-jp.nii.ac.jp/llm/2024/08/07/llm-jp-toxicity-dataset.html)<br>(Acc.) | [MARC-ja](https://github.com/yahoojapan/JGLUE)<br>(Acc.) | [WRIME](https://github.com/ids-cv/wrime)<br>(Acc.) |
|
161 |
| ------ | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
|
162 |
+
| [ModernBERT-Ja-30M](https://huggingface.co/sbintuitions/modernbert-ja-30m) | 37M | 10M | 85.67 | 80.95 | 82.35 | 78.85 | 88.69 | 84.39 | 91.79 | 61.13 | 85.94 | 97.20 | 89.33 | 95.87 | 91.61 |
|
163 |
+
| [ModernBERT-Ja-70M](https://huggingface.co/sbintuitions/modernbert-ja-70m) | 70M | 31M | 86.77 | 85.65 | 83.51 | 80.26 | 90.33 | 85.01 | 92.73 | 60.08 | 87.59 | 96.34 | 91.01 | 96.13 | 92.59 |
|
164 |
+
| [**ModernBERT-Ja-130M**](https://huggingface.co/sbintuitions/modernbert-ja-130m)<br>(this model) | 132M | 80M | <u>88.95</u> | 91.01 | 85.28 | 84.18 | 92.03 | 86.61 | 94.01 | 65.56 | 89.20 | 97.42 | 91.57 | 96.48 | 93.99 |
|
165 |
+
| [ModernBERT-Ja-310M](https://huggingface.co/sbintuitions/modernbert-ja-310m) | 315M | 236M | 89.83 | 93.53 | 86.18 | 84.81 | 92.93 | 86.87 | 94.48 | 68.79 | 90.53 | 96.99 | 91.24 | 96.39 | 95.23 |
|
166 |
+
| | | | | | | | | | | | | | | | |
|
167 |
+
| [LINE DistillBERT](https://huggingface.co/line-corporation/line-distilbert-base-japanese)| 68M | 43M | 85.32 | 76.39 | 82.17 | 81.04 | 87.49 | 83.66 | 91.42 | 60.24 | 84.57 | 97.26 | 91.46 | 95.91 | 92.16 |
|
168 |
| [Tohoku BERT-base v3](https://huggingface.co/tohoku-nlp/bert-base-japanese-v3)| 111M | 86M | 86.74 | 82.82 | 83.65 | 81.50 | 89.68 | 84.96 | 92.32 | 60.56 | 87.31 | 96.91 | 93.15 | 96.13 | 91.91 |
|
169 |
| [LUKE-japanese-base-lite](https://huggingface.co/studio-ousia/luke-japanese-base-lite)| 133M | 107M | 87.15 | 82.95 | 83.53 | 82.39 | 90.36 | 85.26 | 92.78 | 60.89 | 86.68 | 97.12 | 93.48 | 96.30 | 94.05 |
|
170 |
| [Kyoto DeBERTa-v3](https://huggingface.co/ku-nlp/deberta-v3-base-japanese)| 160M | 86M | 88.31 | 87.44 | 84.90 | 84.35 | 91.91 | 86.22 | 93.41 | 63.31 | 88.51 | 97.10 | 92.58 | 96.32 | 93.64 |
|
171 |
| [KoichiYasuoka/modernbert-base-japanese-wikipedia](https://huggingface.co/KoichiYasuoka/modernbert-base-japanese-wikipedia)| 160M | 110M | 82.41 | 62.59 | 81.19 | 76.80 | 84.11 | 82.01 | 90.51 | 60.48 | 81.74 | 97.10 | 90.34 | 94.85 | 87.25 |
|
172 |
| | | | | | | | | | | | | | | | |
|
|
|
173 |
| [Tohoku BERT-large char v2](https://huggingface.co/cl-tohoku/bert-large-japanese-char-v2)| 311M | 303M | 87.23 | 85.08 | 84.20 | 81.79 | 90.55 | 85.25 | 92.63 | 61.29 | 87.64 | 96.55 | 93.26 | 96.25 | 92.29 |
|
174 |
+
| [Tohoku BERT-large v2](https://huggingface.co/tohoku-nlp/bert-large-japanese-v2)| 337M | 303M | 88.36 | 86.93 | 84.81 | 82.89 | 92.05 | 85.33 | 93.32 | 64.60 | 89.11 | 97.64 | 94.38 | 96.46 | 92.77 |
|
175 |
| [Waseda RoBERTa-large (Seq. 512)](https://huggingface.co/nlp-waseda/roberta-large-japanese-seq512-with-auto-jumanpp)| 337M | 303M | 88.37 | 88.81 | 84.50 | 82.34 | 91.37 | 85.49 | 93.97 | 61.53 | 88.95 | 96.99 | 95.06 | 96.38 | 95.09 |
|
176 |
| [Waseda RoBERTa-large (Seq. 128)](https://huggingface.co/nlp-waseda/roberta-large-japanese-with-auto-jumanpp)| 337M | 303M | 88.36 | 89.35 | 83.63 | 84.26 | 91.53 | 85.30 | 94.05 | 62.82 | 88.67 | 95.82 | 93.60 | 96.05 | 95.23 |
|
177 |
+
| [LUKE-japanese-large-lite](https://huggingface.co/studio-ousia/luke-japanese-large-lite)| 414M | 379M | 88.94 | 88.01 | 84.84 | 84.34 | 92.37 | 86.14 | 94.32 | 64.68 | 89.30 | 97.53 | 93.71 | 96.49 | 95.59 |
|
178 |
| [RetrievaBERT](https://huggingface.co/retrieva-jp/bert-1.3b)| 1.30B | 1.15B | 86.79 | 80.55 | 84.35 | 80.67 | 89.86 | 85.24 | 93.46 | 60.48 | 87.30 | 97.04 | 92.70 | 96.18 | 93.61 |
|
179 |
| | | | | | | | | | | | | | | | |
|
180 |
+
| [hotchpotch/mMiniLMv2-L6-H384](https://huggingface.co/hotchpotch/mMiniLMv2-L6-H384)| 107M | 11M | 81.53 | 60.34 | 82.83 | 78.61 | 86.24 | 77.94 | 87.32 | 60.48 | 80.48 | 95.55 | 86.40 | 94.97 | 87.20 |
|
181 |
+
| [hotchpotch/mMiniLMv2-L12-H384](https://huggingface.co/hotchpotch/mMiniLMv2-L12-H384)| 118M | 21M | 82.59 | 62.70 | 83.77 | 78.61 | 87.69 | 79.58 | 87.65 | 60.48 | 81.55 | 95.88 | 90.00 | 94.89 | 88.28 |
|
182 |
| [mBERT](https://huggingface.co/google-bert/bert-base-multilingual-cased)| 178M | 86M | 83.48 | 66.08 | 82.76 | 77.32 | 88.15 | 84.20 | 91.25 | 60.56 | 84.18 | 97.01 | 89.21 | 95.05 | 85.99 |
|
183 |
| [XLM-RoBERTa-base](https://huggingface.co/FacebookAI/xlm-roberta-base)| 278M | 86M | 84.36 | 69.44 | 82.86 | 78.71 | 88.14 | 83.17 | 91.27 | 60.48 | 83.34 | 95.93 | 91.91 | 95.82 | 91.20 |
|
184 |
| [XLM-RoBERTa-large](https://huggingface.co/FacebookAI/xlm-roberta-large)| 560M | 303M | 86.95 | 80.07 | 84.47 | 80.42 | 92.16 | 84.74 | 93.87 | 60.48 | 88.03 | 97.01 | 93.37 | 96.03 | 92.72 |
|
|
|
198 |
|
199 |
## License
|
200 |
|
201 |
+
[MIT License](https://huggingface.co/sbintuitions/modernbert-ja-130m/blob/main/LICENSE)
|
202 |
+
|
203 |
+
|
204 |
+
## Citation
|
205 |
+
|
206 |
+
```bibtex
|
207 |
+
@misc{
|
208 |
+
modernbert-ja,
|
209 |
+
author = {Tsukagoshi, Hayato and Li, Shengzhe and Fukuchi, Akihiko and Shibata, Tomohide},
|
210 |
+
title = {{ModernBERT-Ja}},
|
211 |
+
howpublished = {\url{https://huggingface.co/collections/sbintuitions/modernbert-ja-67b68fe891132877cf67aa0a}},
|
212 |
+
url = {https://huggingface.co/collections/sbintuitions/modernbert-ja-67b68fe891132877cf67aa0a},
|
213 |
+
year = {2025},
|
214 |
+
}
|
215 |
+
```
|