beomi
/

SOLAR-KOEN-10.8B

Text Generation

text-generation-inference

Model card Files Files and versions Community

beomi commited on Feb 20, 2024

Commit

9c9032e

·

verified ·

1 Parent(s): 7f619a1

Fixed typo

Files changed (1) hide show

README.md +1 -14

README.md CHANGED Viewed

@@ -37,20 +37,7 @@ SOLAR-KOEN-10.8B is an auto-regressive language model that leverages an optimize
 | |Training Data|Parameters|Content Length|GQA|Tokens|Learning Rate|
 |---|---|---|---|---|---|---|
-|SOLAR-KOEN-10.8B|*A curated mix of Korean+English Corpora*|10.8B|4k|O|>15B*|5e<sup>-5</sup>|
-**Training Corpus**
-The model was trained using selected datasets from AIHub and Modu Corpus. Detailed information about the training datasets is available below:
-- AI Hub: [corpus/AI_HUB](./corpus/AI_HUB)
-  - Only the `Training` segment of the data was used.
-  - The `Validation` and `Test` segments were deliberately excluded.
-- Modu Corpus: [corpus/MODU_CORPUS](./corpus/MODU_CORPUS)
-The final JSONL dataset used to train this model is approximately 61GB in size.
-Total token count: Approximately 15 billion tokens (*using the expanded tokenizer. With the original SOLAR tokenizer, >60 billion tokens.)
 **Vocab Expansion**

 | |Training Data|Parameters|Content Length|GQA|Tokens|Learning Rate|
 |---|---|---|---|---|---|---|
+|SOLAR-KOEN-10.8B|*A curated mix of Korean+English Corpora*|10.8B|4k|O|>60B*|5e<sup>-5</sup>|
 **Vocab Expansion**