Update README.md
Browse files
README.md
CHANGED
@@ -50,9 +50,9 @@ The following hyperparameters were used during pretraining:
|
|
50 |
- total_train_batch_size: 192
|
51 |
- max_seq_length: 4096
|
52 |
- training_steps: 600000
|
53 |
-
- warmup_steps:
|
54 |
-
- bf16:
|
55 |
-
- deepspeed: ds_config.json
|
56 |
|
57 |
## Performance on JGLUE
|
58 |
|
@@ -61,7 +61,7 @@ We tuned learning rate and training epochs for each model and task following [th
|
|
61 |
|
62 |
For tasks other than MARC-ja, the maximum length is short, so the attention_type was set to "original_full" and fine-tuning was performed for tasks other than MARC-ja. For MARC-ja, both "block_sparse" and "original_full" were used.
|
63 |
|
64 |
-
| Model | MARC-ja
|
65 |
|-------------------------------|--------------|---------------|----------|-----------|-----------|------------|------------|
|
66 |
| Waseda RoBERTa base | 0.965 | 0.913 | 0.876 | 0.905 | 0.853 | 0.916 | 0.853 |
|
67 |
| Waseda RoBERTa large (seq512) | 0.969 | 0.925 | 0.890 | 0.928 | 0.910 | 0.955 | 0.900 |
|
|
|
50 |
- total_train_batch_size: 192
|
51 |
- max_seq_length: 4096
|
52 |
- training_steps: 600000
|
53 |
+
- warmup_steps: 6000
|
54 |
+
- bf16: true
|
55 |
+
- deepspeed: [ds_config.json](https://huggingface.co/nlp-waseda/bigbird-base-japanese/blob/main/ds_config.json)
|
56 |
|
57 |
## Performance on JGLUE
|
58 |
|
|
|
61 |
|
62 |
For tasks other than MARC-ja, the maximum length is short, so the attention_type was set to "original_full" and fine-tuning was performed for tasks other than MARC-ja. For MARC-ja, both "block_sparse" and "original_full" were used.
|
63 |
|
64 |
+
| Model | MARC-ja/acc | JSTS/pearson | JSTS/spearman | JNLI/acc | JSQuAD/EM | JSQuAD/F1 | JComQA/acc |
|
65 |
|-------------------------------|--------------|---------------|----------|-----------|-----------|------------|------------|
|
66 |
| Waseda RoBERTa base | 0.965 | 0.913 | 0.876 | 0.905 | 0.853 | 0.916 | 0.853 |
|
67 |
| Waseda RoBERTa large (seq512) | 0.969 | 0.925 | 0.890 | 0.928 | 0.910 | 0.955 | 0.900 |
|