matsuo-lab
/

weblab-10b-instruction-sft

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

matsuo-lab commited on Sep 4, 2023

Commit

112a5ad

•

1 Parent(s): ebb84f2

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -17,14 +17,14 @@ This repository provides a Japanese-centric multilingual GPT-NeoX model of 10 bi
 * **Pre-training**
-    The model was trained on around **600B** tokens from a mixture of the following corpora
     - [Japanese C4](https://huggingface.co/datasets/mc4)
     - [The Pile](https://huggingface.co/datasets/EleutherAI/pile)
 * **Instruction-supervised-finetuning**
-    The model was finetuned on a subset records from a mixture of the following dataset
     - [Alpaca (English)](https://github.com/gururise/AlpacaDataCleaned/blob/main/alpaca_data_cleaned.json)
     - [Alpaca (Japanese translation)](https://github.com/shi3z/alpaca_ja/blob/main/alpaca_cleaned_ja.json)

 * **Pre-training**
+    The model was trained on around **600B** tokens from a mixture of the following corpora.
     - [Japanese C4](https://huggingface.co/datasets/mc4)
     - [The Pile](https://huggingface.co/datasets/EleutherAI/pile)
 * **Instruction-supervised-finetuning**
+    The model was finetuned on a subset records from a mixture of the following dataset. Training epoch: 1.
     - [Alpaca (English)](https://github.com/gururise/AlpacaDataCleaned/blob/main/alpaca_data_cleaned.json)
     - [Alpaca (Japanese translation)](https://github.com/shi3z/alpaca_ja/blob/main/alpaca_cleaned_ja.json)