llm-jp
/

llm-jp-3-8x1.8b-instruct3

Text Generation

text-generation-inference

Model card Files Files and versions Community

Taka008 commited on Mar 27

Commit

e53de85

·

verified ·

1 Parent(s): ff64e01

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -77,6 +77,8 @@ print(tokenizer.decode(output))
 |8x1.8b|24|2048|16|8|2|4096|407,498,752|8,858,863,616|2,924,279,808|9,266,362,368|9,266,362,368|
 |8x13b|40|5120|40|8|2|4096|1,018,746,880|72,144,081,920|22,200,806,400|73,162,828,800|
 ## Tokenizer
 The tokenizer of this model is based on [huggingface/tokenizers](https://github.com/huggingface/tokenizers) Unigram byte-fallback model.

 |8x1.8b|24|2048|16|8|2|4096|407,498,752|8,858,863,616|2,924,279,808|9,266,362,368|9,266,362,368|
 |8x13b|40|5120|40|8|2|4096|1,018,746,880|72,144,081,920|22,200,806,400|73,162,828,800|
+If you would like to learn more about the pretraining of the LLM-jp-3 MoE series, please refer to this [blog post](https://llm-jp.nii.ac.jp/blog/2025/03/27/moe3.html).
 ## Tokenizer
 The tokenizer of this model is based on [huggingface/tokenizers](https://github.com/huggingface/tokenizers) Unigram byte-fallback model.