mkmkmkmk commited on
Commit
f56e0d6
1 Parent(s): 966f022

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -50,9 +50,9 @@ The following hyperparameters were used during pretraining:
50
  - total_train_batch_size: 192
51
  - max_seq_length: 4096
52
  - training_steps: 600000
53
- - warmup_steps: 10000
54
- - bf16: True
55
- - deepspeed: ds_config.json
56
 
57
  ## Performance on JGLUE
58
 
@@ -61,7 +61,7 @@ We tuned learning rate and training epochs for each model and task following [th
61
 
62
  For tasks other than MARC-ja, the maximum length is short, so the attention_type was set to "original_full" and fine-tuning was performed for tasks other than MARC-ja. For MARC-ja, both "block_sparse" and "original_full" were used.
63
 
64
- | Model | MARC-ja(original_full)/acc | JSTS/pearson | JSTS/spearman | JNLI/acc | JSQuAD/EM | JSQuAD/F1 | JComQA/acc |
65
  |-------------------------------|--------------|---------------|----------|-----------|-----------|------------|------------|
66
  | Waseda RoBERTa base | 0.965 | 0.913 | 0.876 | 0.905 | 0.853 | 0.916 | 0.853 |
67
  | Waseda RoBERTa large (seq512) | 0.969 | 0.925 | 0.890 | 0.928 | 0.910 | 0.955 | 0.900 |
 
50
  - total_train_batch_size: 192
51
  - max_seq_length: 4096
52
  - training_steps: 600000
53
+ - warmup_steps: 6000
54
+ - bf16: true
55
+ - deepspeed: [ds_config.json](https://huggingface.co/nlp-waseda/bigbird-base-japanese/blob/main/ds_config.json)
56
 
57
  ## Performance on JGLUE
58
 
 
61
 
62
  For tasks other than MARC-ja, the maximum length is short, so the attention_type was set to "original_full" and fine-tuning was performed for tasks other than MARC-ja. For MARC-ja, both "block_sparse" and "original_full" were used.
63
 
64
+ | Model | MARC-ja/acc | JSTS/pearson | JSTS/spearman | JNLI/acc | JSQuAD/EM | JSQuAD/F1 | JComQA/acc |
65
  |-------------------------------|--------------|---------------|----------|-----------|-----------|------------|------------|
66
  | Waseda RoBERTa base | 0.965 | 0.913 | 0.876 | 0.905 | 0.853 | 0.916 | 0.853 |
67
  | Waseda RoBERTa large (seq512) | 0.969 | 0.925 | 0.890 | 0.928 | 0.910 | 0.955 | 0.900 |