Update README.md
Browse files
README.md
CHANGED
@@ -116,7 +116,15 @@ For hyperparameters, we explored the following ranges:
|
|
116 |
- Learning rate: `{5e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4}`
|
117 |
- Number of epochs:
|
118 |
- Tasks with a large number of instances: `{1, 2}`
|
119 |
-
- Tasks with fewer instances: `{3, 5, 10}`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
120 |
|
121 |
We conducted evaluations using 5-fold cross-validation.
|
122 |
That is, we trained the model on the `train` set and evaluated it on the `validation` set.
|
|
|
116 |
- Learning rate: `{5e-6, 1e-5, 2e-5, 3e-5, 5e-5, 1e-4}`
|
117 |
- Number of epochs:
|
118 |
- Tasks with a large number of instances: `{1, 2}`
|
119 |
+
- Tasks with fewer instances: `{3, 5, 10}`
|
120 |
+
|
121 |
+
In the experiments, we loaded several Japanese models that are publicly available on HuggingFace using `AutoModel` and constructed classification models by appending a classification head consisting of a linear layer, a GELU activation function, and another linear layer.
|
122 |
+
This was done because HuggingFace's `AutoModelForSequenceClassification` comes with different implementations for each model, and using them directly would result in classification heads that differ from one model to another.
|
123 |
+
|
124 |
+
For the embeddings fed into the classification layer, we used the embedding of the special token at the beginning of the sentence.
|
125 |
+
That is, `[CLS]` in BERT and `<s>` in RoBERTa.
|
126 |
+
Note that our model does not perform the next sentence prediction (NSP) task during pretraining, so `<s>` is added at the beginning of the sentence, not `<cls>`.
|
127 |
+
Therefore, we used the `<s>` token for classification.
|
128 |
|
129 |
We conducted evaluations using 5-fold cross-validation.
|
130 |
That is, we trained the model on the `train` set and evaluated it on the `validation` set.
|