Fill-Mask
Transformers
Safetensors
Japanese
English
modernbert
hpprc commited on
Commit
9d4bb92
·
verified ·
1 Parent(s): 9f4a175

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -1
README.md CHANGED
@@ -116,7 +116,15 @@ For hyperparameters, we explored the following ranges:
116
  - Learning rate: `{5e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4}`
117
  - Number of epochs:
118
  - Tasks with a large number of instances: `{1, 2}`
119
- - Tasks with fewer instances: `{3, 5, 10}`
 
 
 
 
 
 
 
 
120
 
121
  We conducted evaluations using 5-fold cross-validation.
122
  That is, we trained the model on the `train` set and evaluated it on the `validation` set.
 
116
  - Learning rate: `{5e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4}`
117
  - Number of epochs:
118
  - Tasks with a large number of instances: `{1, 2}`
119
+ - Tasks with fewer instances: `{3, 5, 10}`
120
+
121
+ In the experiments, we loaded several Japanese models that are publicly available on HuggingFace using `AutoModel` and constructed classification models by appending a classification head consisting of a linear layer, a GELU activation function, and another linear layer.
122
+ This was done because HuggingFace's `AutoModelForSequenceClassification` comes with different implementations for each model, and using them directly would result in classification heads that differ from one model to another.
123
+
124
+ For the embeddings fed into the classification layer, we used the embedding of the special token at the beginning of the sentence.
125
+ That is, `[CLS]` in BERT and `<s>` in RoBERTa.
126
+ Note that our model does not perform the next sentence prediction (NSP) task during pretraining, so `<s>` is added at the beginning of the sentence, not `<cls>`.
127
+ Therefore, we used the `<s>` token for classification.
128
 
129
  We conducted evaluations using 5-fold cross-validation.
130
  That is, we trained the model on the `train` set and evaluated it on the `validation` set.