murawaki commited on
Commit
462bd8e
1 Parent(s): 4e1b05b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -40,7 +40,7 @@ You can also use this model to get the features of a given text.
40
 
41
  ## Vocabulary
42
 
43
- This model has a character-level vocabulary of size 6K. To be precise, rare characters may be split into bytes because we use byte-level byte-pair encoding (BPE). The tokenizer was trained on a small subset of the training data that were converted into a one-character-per-line format so that merge operations never transgressed character boundaries.
44
 
45
  ## Training data
46
 
@@ -55,7 +55,7 @@ Also note that Japanese Wikipedia was duplicated 10 times to make the total size
55
 
56
  ## Training procedure
57
 
58
- The training took XX weeks using a single NVIDIA A100 80GB GPU.
59
 
60
  The following hyperparameters were used during pre-training:
61
 
 
40
 
41
  ## Vocabulary
42
 
43
+ A character-level vocabulary of size 6K is used. To be precise, rare characters may be split into bytes because byte-level byte-pair encoding (BPE) is used. The BPE tokenizer was trained on a small subset of the training data. Since the data were converted into a one-character-per-line format, merge operations never transgressed character boundaries.
44
 
45
  ## Training data
46
 
 
55
 
56
  ## Training procedure
57
 
58
+ The training took about 3 months (with two interruptions) with a single NVIDIA A100 80GB GPU.
59
 
60
  The following hyperparameters were used during pre-training:
61