Chinese-Mixtral-8x7B
๐ ไป็ป
ๆฌ้กน็ฎๅบไบMistralๅๅธ็ๆจกๅMixtral-8x7B่ฟ่กไบไธญๆๆฉ่ฏ่กจๅข้้ข่ฎญ็ป๏ผๅธๆ่ฟไธๆญฅไฟ่ฟไธญๆ่ช็ถ่ฏญ่จๅค็็คพๅบๅฏนMoEๆจกๅ็็ ็ฉถใๆไปฌๆฉๅ ๅ็่ฏ่กจๆพ่ๆ้ซไบๆจกๅๅฏนไธญๆ็็ผ่งฃ็ ๆ็๏ผๅนถ้่ฟๅคง่งๆจกๅผๆบ่ฏญๆๅฏนๆฉ่ฏ่กจๆจกๅ่ฟ่กๅข้้ข่ฎญ็ป๏ผไฝฟๆจกๅๅ ทๅคไบๅผบๅคง็ไธญๆ็ๆๅ็่งฃ่ฝๅใ
้กน็ฎๅผๆบๅ ๅฎน๏ผ
- ไธญๆMixtral-8x7Bๆฉ่ฏ่กจๅคงๆจกๅ
- ๆฉ่ฏ่กจๅข้้ข่ฎญ็ปไปฃ็
่ฏทๆณจๆ๏ผChinese-Mixtral-8x7Bไป็ถๅฏ่ฝ็ๆๅ ๅซไบๅฎๆง้่ฏฏ็่ฏฏๅฏผๆงๅๅคๆๅ ๅซๅ่ง/ๆญง่ง็ๆๅฎณๅ ๅฎน๏ผ่ฏท่ฐจๆ ้ดๅซๅไฝฟ็จ็ๆ็ๅ ๅฎน๏ผ่ฏทๅฟๅฐ็ๆ็ๆๅฎณๅ ๅฎนไผ ๆญ่ณไบ่็ฝใ
๐ฅ ๆจกๅไธ่ฝฝ
ๆฌ้กน็ฎไฝฟ็จQLoRA่ฟ่ก่ฎญ็ป๏ผLoRAๆ้ไธๅๅนถๆ้ๅ็ๆจกๅๅๅซๅผๆบ๏ผๆจๅฏไปฅๆ นๆฎ่ชๅทฑ็้ๆฑ้ๆฉไธ่ฝฝ๏ผ
ๆจกๅๅ็งฐ | ๆจกๅๅคงๅฐ | ไธ่ฝฝๅฐๅ | ๅคๆณจ |
---|---|---|---|
Chinese-Mixtral-8x7B | 88GB | ๐คHuggingFace | ไธญๆๆฉ่ฏ่กจๅฎๆดๆจกๅ๏ผๅฏไปฅ็ดๆฅไฝฟ็จ |
Chinese-Mixtral-8x7B-adapter | 2.7GB | ๐คHuggingFace | LoRAๆ้๏ผ้่ฆไธๅ็Mixtral-8x7B่ฟ่กๅๅนถๆๅฏไปฅไฝฟ็จ๏ผๅๅนถ่ๆฌ่ฏทๅ่่ฟ้ |
๐ป ๆจกๅๆจ็
Chinese-Mixtral-8x7Bๆฏๆๅฎๆด็Mixtral-8x7Bๆจกๅ็ๆ๏ผๅ
ๆฌไฝฟ็จvLLM
ใFlash Attention 2
่ฟ่กๅ ้๏ผไฝฟ็จbitsandbytes
่ฟ่กๆจกๅ้ๅ็ญใไปฅไธๆฏไฝฟ็จChinese-Mixtral-8x7B่ฟ่กๆจ็็ไปฃ็ ็คบไพใ
ไฝฟ็จFlash Attention 2๏ผ
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "HIT-SCIR/Chinese-Mixtral-8x7B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, attn_implementation="flash_attention_2", torch_dtype=torch.bfloat16, device_map="auto")
text = "ๆ็ๅๅญๆฏ"
inputs = tokenizer(text, return_tensors="pt").to(0)
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
ไฝฟ็จ4bit้ๅ๏ผ
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "HIT-SCIR/Chinese-Mixtral-8x7B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True, device_map="auto")
text = "ๆ็ๅๅญๆฏ"
inputs = tokenizer(text, return_tensors="pt").to(0)
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
่ฏทๆณจๆ๏ผChinese-Mixtral-8x7Bไธบๅบๅบงๆจกๅ๏ผๆฒกๆ็ป่ฟๆไปคๅพฎ่ฐ๏ผๅ ๆญคๆไปค้ตๅพช่ฝๅๆ้ใๆจๅฏไปฅๅ่ๅพฎ่ฐไธ่ๅฏนๆจกๅ่ฟ่กๅพฎ่ฐใ
๐ ๆจกๅๆง่ฝ
ๆจกๅ็ปผๅ่ฝๅ
ๆไปฌๅๅซไฝฟ็จไปฅไธ่ฏๆตๆฐๆฎ้ๅฏนChinese-Mixtral-8x7B่ฟ่ก่ฏๆต๏ผ
- C-Eval๏ผไธไธชๅ จ้ข็ไธญๆๅบ็กๆจกๅ่ฏไผฐๅฅไปถใๅฎๅ ๅซไบ13948ไธชๅค้กน้ๆฉ้ข๏ผๆถต็ไบ52ไธชไธๅ็ๅญฆ็งๅๅไธช้พๅบฆ็บงๅซใ
- CMMLU๏ผไธไธช็ปผๅๆง็ไธญๆ่ฏไผฐๅบๅ๏ผไธ้จ็จไบ่ฏไผฐ่ฏญ่จๆจกๅๅจไธญๆ่ฏญๅขไธ็็ฅ่ฏๅๆจ็่ฝๅ๏ผๆถต็ไบไปๅบ็กๅญฆ็งๅฐ้ซ็บงไธไธๆฐดๅนณ็67ไธชไธป้ขใ
- MMLU๏ผไธไธชๅ ๅซ57ไธชๅค้ไปปๅก็่ฑๆ่ฏๆตๆฐๆฎ้๏ผๆถต็ไบๅ็ญๆฐๅญฆใ็พๅฝๅๅฒใ่ฎก็ฎๆบ็งๅญฆใๆณๅพ็ญ๏ผ้พๅบฆ่ฆ็้ซไธญๆฐดๅนณๅฐไธๅฎถๆฐดๅนณ๏ผๆฏ็ฎๅไธปๆต็LLM่ฏๆตๆฐๆฎ้ไนไธใ
- HellaSwag๏ผไธไธชๆๅ ทๆๆ็่ฑๆNLI่ฏๆตๆฐๆฎ้๏ผๆฏไธไธช้ฎ้ข้ฝ้่ฆๅฏนไธไธๆ่ฟ่กๆทฑๅ ฅ็่งฃ๏ผ่ไธ่ฝๅบไบๅธธ่ฏ่ฟ่กๅ็ญใ
ๆ นๆฎMistralๅๅธ็ๆๆฏๆฅๅ๏ผMixtral-8x7Bๅจๆจ็ๆถๅฐๆฟๆดป13Bๅๆฐใไธ่กจไธบChinese-Mixtral-8x7Bไธๅ ถไป13B่งๆจก็ไธญๆๆฉ่ฏ่กจๆจกๅๅจๅไธช่ฏๆตๆฐๆฎ้ไธ็5-shot็ปๆ๏ผ
ๆจกๅๅ็งฐ | ๅข้่ฎญ็ป่ฏญๆ | C-Eval (ไธญๆ) |
CMMLU (ไธญๆ) |
MMLU (่ฑๆ) |
HellaSwag (่ฑๆ) |
---|---|---|---|---|---|
IDEA-CCNL/Ziya2-13B-Base | 650B Token | 59.29 | 60.93 | 59.86 | 58.90 |
TigerResearch/tigerbot-13b-base-v3 | 500B Token | 50.52 | 51.65 | 53.46 | 59.16 |
Linly-AI/Chinese-LLaMA-2-13B-hf | 11B Token | 42.57 | 41.95 | 51.32 | 59.05 |
hfl/chinese-llama-2-13b | ็บฆ30B Token(120GB) | 41.90 | 42.08 | 51.92 | 59.28 |
Chinese-Mixtral-8x7B(ๆฌ้กน็ฎ) | 42B Token | 52.08 | 51.08 | 69.80 | 65.69 |
ๅจไธญๆ็ฅ่ฏๅ็่งฃๆน้ข๏ผๆไปฌ็Chinese-Mixtral-8x7BไธTigerBot-13B-Base-v3ๆง่ฝ็ธๅฝใ็ฑไบChinese-Mixtral-8x7B็่ฎญ็ปๆฐๆฎ้ไป ไธบTigerBot-13B-Base-v3็8%๏ผๆไปฌ็ๆจกๅไปๆ่ฟไธๆญฅๆๅ็็ฉบ้ดใไธๆญคๅๆถ๏ผๅพ็ไบๅ็Mixtral-8x7Bๆจกๅๅผบๅคง็ๆง่ฝ๏ผๆไปฌ็Chinese-Mixtral-8x7B่พพๅฐไบๅไธชๆฉ่ฏ่กจๆจกๅ็ๆๅผบ่ฑๆๆฐดๅนณใ
็ฑไบไธๅ็ๆฌ็่ฏๆต่ๆฌๅฎ็ฐ็ป่ๆ็ปๅพฎๅทฎๅผ๏ผไธบไบไฟ่ฏ่ฏๆต็ปๆ็ไธ่ดๆงๅๅ ฌๅนณๆง๏ผๆไปฌ็่ฏๆต่ๆฌ็ปไธไฝฟ็จEleutherAIๅๅธ็lm-evaluation-harness๏ผcommit hashไธบ28ec7faใ
ๆจกๅ็ๆๆๆ
ไธ่กจไธบๅไธชๆฉ่ฏ่กจๆจกๅ็็ๆๆๆใ็ฑไบ้จๅๆจกๅ็้ข่ฎญ็ป่ฏญๆๆชไฝฟ็จeos_token
่ฟ่กๅ้๏ผๆไปฌ้็จไบmax_tokens = 100
ๅฏน็ๆๆๆฌ่ฟ่กๆชๆญใๆไปฌ็้ๆ ทๅๆฐไธบtemperature = 0.8, top_p = 0.9
ใ
ไธญๆ็ผ่งฃ็ ๆ็
้ๅฏนไธญๆ็ผ่งฃ็ ๆ็๏ผๆไปฌไฝฟ็จๅไธชๆฉ่ฏ่กจๆจกๅ็ๅ่ฏๅจๅฏนSkyPileๆฐๆฎ้็ไธไธชๅ็๏ผ2023-06_zh_head_0000.jsonl๏ผ่ฟ่ก็ผ็ ๏ผๅฏนๆฏไบๅไธชๅ่ฏๅจ่พๅบ็ไธญๆๆๆฌToken้๏ผ
ๆจกๅๅ็งฐ | ๆจกๅ็ฑปๅซ | ่ฏ่กจๅคงๅฐ | ไธญๆๆๆฌToken้ | ็ผ่งฃ็ ๆ็ |
---|---|---|---|---|
meta-llama/Llama-2-13B-hf | LLaMA | 32000 | 780M | ไฝ |
mistralai/Mixtral-8x7B-v0.1 | Mixtral | 32000 | 606M | ไฝ |
Linly-AI/Chinese-LLaMA-2-13B-hf | LLaMA | 40076 | 532M | ไธญ |
IDEA-CCNL/Ziya2-13B-Base | LLaMA | 39424 | 532M | ไธญ |
hfl/chinese-llama-2-13b | LLaMA | 55296 | 365M | ้ซ |
TigerResearch/tigerbot-13b-base-v3 | LLaMA | 65112 | 342M | ้ซ |
Chinese-Mixtral-8x7B(ๆฌ้กน็ฎ) | Mixtral | 57000 | 355M | ้ซ |
ๅจ็บฆ1.4GB็ๆต่ฏๆๆฌไธญ๏ผๆไปฌ็Chinese-Mixtral-8x7Bไธญๆ็ผ่งฃ็ ๆ็ไป ๆฌกไบTigerBot-13B-Base-v3๏ผ่พๅๆจกๅๆ้ซไบ41.5%ใ่ฟๆๅฉไบๅ ้ไธญๆๆๆฌ็ๆจ็้ๅบฆ๏ผๅนถๅจIn-Context LearningใChain-of-Thought็ญๅบๆฏไธญ่็ๅบๅ้ฟๅบฆ๏ผๆๅฉไบๆ้ซๅคๆๆจ็ไปปๅก็ๆง่ฝใ
โ๏ธ ่ฎญ็ป็ป่
่ฏ่กจๆฉๅ
ๆไปฌไฝฟ็จsentencepiece
ๅจ12G็ฅไนๆฐๆฎๅ2Gๆ้ๆฐๆฎไธ่ฎญ็ปไธญๆBPE่ฏ่กจใๆไปฌๅจ่ฎญ็ป่ฏ่กจๆถๅๅซๆไธพไบไธญๆๅๅญTokenๆฐ้ไปฅๅไธญๆๆปTokenๆฐ้๏ผๅนถๅฏนไบ่
่ฟ่ก็ปๅ๏ผๅพๅฐไบๆฐ็พไธชๅคงๅฐใๅ
ๅฎนๅๅผ็่ฏ่กจใไธบไบๅพๅฐๆ้ๅ็่ฏ่กจ๏ผๆไปฌ้่ฟZheng Bo็ญไบบๆๅบ็ALP่ฎก็ฎ่ฟไบ่ฏ่กจ็ไธญๆ่ฏๆฑ่ฝๅใALP้่ฟ่ฎก็ฎ็นๅฎ่ฏญ่จ็ๅญ่ฏๅๅ็ฒๅบฆ๏ผๅนถๅฏน่ฏ่กจ็ไธญไฝ้ขๅญ่ฏ่ฟ่กๆฉ็ฝ๏ผๆฏไธ็งๆนไพฟๅฟซๆท็่กก้็นๅฎ่ฏญ่จ่ฏๆฑ่ฝๅ็ๆๆ ใ
ๆไปฌๅจไนฆ็ฑๅ็พ็ง่ฏญๆไธ่ฏไผฐไบไธๅ่ฏ่กจ็ALPๅผใๅพ็คบไธญ๏ผๅๆกๆฒ็บฟๅๅซไปฃ่กจๅ็งไธญๆๅๅญTokenๆฐ้็่ฏ่กจ๏ผ4451ใ5435ใ6414ๅ7434๏ผใไธบไบ้ฟๅ ่ฏ่กจ่ฟๅฐๅฏผ่ดไธญๆๅ็ผฉ็่ฟไฝ๏ผไปฅๅ่ฏ่กจ่ฟๅคงๅฏผ่ดembeddingๅฑ่ฟไบ็จ็๏ผๆไปฌ้ๅALPๆฒ็บฟ็ๆ็น๏ผๅฏนๅบๅ่ฏ่กจไธญๆฐๅข25000ไธชไธญๆTokenใๅจๆญคๅบ็กไธ๏ผๆไปฌ้ๆฉไบๅๆกๆฒ็บฟไธญALPๆๅคง่ ๏ผๅณๆฐๅข6414ไธชไธญๆๅๅญToken็่ฏ่กจ๏ผไฝไธบๆ็ปChinese-Mixtral-8x7B้็จ็่ฏ่กจใ
ๅจ่ทๅพๆฐ่ฏ่กจๅ๏ผๆไปฌ้่ฆๅฏนembeddingๅlm_headๅฑ่ฟ่กๆฉๅ ๅๅๅงๅใๆไปฌไฝฟ็จๆฐTokenๅจๆงembeddingๅฑไธญ็่ฏๅตๅ ฅๅนณๅๅผๅฏนๆฉๅ ้จๅ่ฟ่กๅๅงๅใๅจๆไปฌ็ๅๆๅฎ้ชไธญ๏ผ่ฟ็งๆนๆณ็ฅไผไบHuggingFace็้ป่ฎคๅฎ็ฐ๏ผๅณไฝฟ็จๅบๅฎ็ๆญฃๆๅๅธ่ฟ่กๅๅงๅใ
ๅข้้ข่ฎญ็ป
Mixtral-8x7Bๆจกๅๅๆฐ้ไธบ46.7B๏ผๅ จๅๆฐ่ฎญ็ป้่ฆๅๆถไฝฟ็จๅค็งๅนถ่ก็ญ็ฅ๏ผๅจ่ฎญ็ป่ตๆบๅ้็ๆ ๅตไธๆถ้ดๆๆฌ่ฟ้ซใๅ ๆญคๆไปฌ้็จHuggingFaceๅฎๆนๆจ่็ๆนๆณ๏ผไฝฟ็จQLoRAๅฏนๆจกๅ่ฟ่ก่ฎญ็ปใQLoRAๅจLoRAไฝ็งฉๅ่งฃ็ๅบ็กไธ๏ผ้่ฟๅผๅ ฅ4ไฝ้ๅใๅ้้ๅๅๅฉ็จNVIDIA็ปไธๅ ๅญ่ฟ่กๅ้กต๏ผ่ฟไธๆญฅๅๅฐไบ่ฎญ็ปๆ้ๆพๅญ๏ผๅๆถไฟๆไบไธๅ จๅๆฐ่ฎญ็ป็ธๅฝ็ๆง่ฝใ
ๆไปฌๅ่Yiming Cui็ญไบบๅฏนLoRA็่ฎพ็ฝฎ๏ผๅฏนๅๆจกๅๆๆLinearๅฑๅบ็จไฝ็งฉๅ่งฃ๏ผๅนถๅฐๆฉๅขๅ็embeddingๅlm_headๅฑ็ๅๆฐ่ฎพ็ฝฎไธบๅฏ่ฎญ็ปใๅฏนไบๆจกๅไธปไฝ๏ผๆไปฌ้็จNF4ๆ ผๅผ่ฟ่ก้ๅ๏ผ่ฟ็งๆ ผๅผๅฏไปฅไฝฟๅพ้ๅๅ็ๆฐๆฎไธ้ๅๅๅ ทๆๅ็ญ็ๆฐๆฎๅๅธ๏ผๆจกๅ็ๆ้ไฟกๆฏๆๅคฑๆดๅฐใ
็ฏๅขๅๅค
ๆไปฌๅปบ่ฎฎไฝฟ็จPython 3.10 + torch 2.0.1
# Pytorch + Transformers
$ pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
$ pip install transformers==4.36.2 datasets evaluate peft accelerate gradio optimum sentencepiece
$ pip install jupyterlab scikit-learn pandas matplotlib tensorboard nltk rouge bitsandbytes fire
# DeepSpeed
$ git clone https://github.com/microsoft/DeepSpeed.git
$ cd DeepSpeed
$ DS_BUILD_FUSED_ADAM=1 pip3 install .
# Flash Attention
$ pip install flash-attn --no-build-isolation
ๆฐๆฎ้ไธ่ฝฝ
ๆไปฌๅบไบ็ฐๆ็ๅผๆบๆฐๆฎ้่ฎญ็ปไบChinese-Mixtral-8x7B๏ผๆฐๆฎ้ๅ ๆฌ๏ผ
ๆฐๆฎ้ๅ็งฐ | ๆฐๆฎ้่ฏญ่จ | ไฝฟ็จๆฐๆฎ้ | ๅคๆณจ |
---|---|---|---|
Skywork/SkyPile-150B | ไธญๆ | 30B | ไป ไฝฟ็จ2022 + 2023ๅนด็ๆฐๆฎ |
DKYoon/SlimPajama-6B | ่ฑๆ | 12B | ๆฐๆฎ้้ๅค2 Epoch |
้่ฟdata/download.py
ๅฐๆฐๆฎ้ไธ่ฝฝๅฐdata
ไธญใ้ๅฏนSlimpajamaๆฐๆฎ้๏ผ้่ฆไฝฟ็จdata/parquet2jsonl.py
ๅฐๅๅงๆฐๆฎ้่ฝฌๆขไธบjsonl
ๆ ผๅผใ
ไธ่ฝฝๅ็ๆฐๆฎ้ไธบๅคไธชjsonlๆไปถ็ๅ็๏ผไฝฟ็จcat
ๅฐๅคไธชๅ็ๅๅนถไธบไธไธชjsonlๆไปถใ
$ cat *.jsonl > all.jsonl
้่ฟsplit
ๅฐjsonlๅๅไธบtrainๅvalid้ๅใๆฌ้กน็ฎไธญtrainๅvalid็่กๆฐๆฏไพไธบ999:1ใ
$ wc -l all.jsonl # ่ฎก็ฎๆฐๆฎ้ๆป่กๆฐ
$ split -l <lines> all.jsonl # ๆ999:1่ฎก็ฎtrain/valid่กๆฐ๏ผ่ฟ่กๅๅ
$ mv xaa DKYoon-SlimPajama-6B-train.jsonl # ้ๅฝๅ
$ mv xab DKYoon-SlimPajama-6B-dev.jsonl
ๆฐๆฎ้้ขๅค็
ๅฐๆฐๆฎ้ๅ็งฐๅ่ทฏๅพๆณจๅๅฐdata/datasets.toml
ไธญ๏ผ
[DKYoon-SlimPajama-6B] # ๆฐๆฎ้ๅ็งฐ
splits = ["train", "dev"] # ๆฐๆฎ้train/valid้ๅ
root = "{DATA_DIR}/en/{name}" # ๆฐๆฎ้ๆ น็ฎๅฝ
doc = "{name}-{split}" # ๆฐๆฎ้ๆไปถๅ
encoded = "encoded-{name}-{split}" # ้ขๅค็ไฟๅญไฝ็ฝฎ
ไฝฟ็จdata/preprocess_datasets.py
ๅฏนๆฐๆฎ้่ฟ่กๅญ่ฏๅๅ๏ผไป่ๅ ๅฟซ่ฎญ็ป้ๅบฆใ
$ python data/preprocess_datasets.py --ds_name SkyPile-150B-2023 --tokenizer_name_or_path tokenizer/Mixtral-8x7B-v0.1-vocab
$ python data/preprocess_datasets.py --ds_name DKYoon-SlimPajama-6B --tokenizer_name_or_path tokenizer/Mixtral-8x7B-v0.1-vocab
ๅจ่ฟ่กๅญ่ฏๅๅๅ๏ผๅฏไปฅไฝฟ็จdata/utils.py
ๆฅ็ๅไธชๆฐๆฎ้็tokenๆป้๏ผ
$ python data/utils.py
ๅผๅง่ฎญ็ป
่ฎญ็ปๅฏๅจ่ๆฌไธบscripts/train.sh
ใๅฏไปฅ้่ฟไฟฎๆนๅ
ถไธญ็TRAIN_DATASETS
ไฟฎๆน่ฎญ็ปๆฐๆฎ้ๅๆฐๆฎ้ๆฏไพ๏ผ
TRAIN_DATASETS=(
1:SkyPile-150B-2022 # ไฝฟ็จๅ
จ้SkyPile-150B-2022
0.1:SkyPile-150B-2023 # ไฝฟ็จSkyPile-150B-2023็10%ๆฐๆฎ
1:DKYoon-SlimPajama-6B # ไฝฟ็จๅ
จ้DKYoon-SlimPajama-6B
)
ๅฆๆๆจไฝฟ็จSLURM้็พค็ฎก็็ณป็ป๏ผๅฏไปฅ้่ฟsbatch
่ฟ่กๆไบค๏ผ
$ sbatch scripts/train.sh
ๅฆๆๆฒกๆSLURMๆๅธๆ้่ฟๅฝไปค่กๅฏๅจ่ฎญ็ป๏ผๆจๅฏไปฅ็ดๆฅๆๅscripts/train.sh
ไธญ็torchrun
ๅผๅง่ฎญ็ปใ
ๅพฎ่ฐ
ๆฌ้กน็ฎๅๅธ็Chinese-Mixtral-8x7Bไธบๅบๅบงๆจกๅ๏ผๆฒกๆ็ป่ฟๅพฎ่ฐใๅฆๆๆจๅธๆไฝฟ็จChinese-Mixtral-8x7B่ฟ่กไธๆธธไปปๅกๅพฎ่ฐๆSFT๏ผๅฏไปฅๅ่HuggingFace็ปๅบMixtral-8x7B็QLoRAๅพฎ่ฐ่ๆฌ่ฟ่ก่ฎญ็ป๏ผHuggingFace็ๅฎๆน็คบไพไปฃ็ ใ
โ๏ธ ๅผ็จ
ๅฆๆๆจ่งๅพๆฌ้กน็ฎๅฏนๆจ็็ ็ฉถๆๆๅธฎๅฉๆไฝฟ็จไบๆฌ้กน็ฎ็ไปฃ็ ๏ผ่ฏทๅผ็จๆฌ้กน็ฎ๏ผ
@misc{Chinese-Mixtral-8x7B,
author = {HIT-SCIR},
title = {Chinese-Mixtral-8x7B: An Open-Source Mixture-of-Experts LLM},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/HIT-SCIR/Chinese-Mixtral-8x7B}}
}
๐ Star History
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 66.69 |
AI2 Reasoning Challenge (25-Shot) | 63.57 |
HellaSwag (10-Shot) | 85.98 |
MMLU (5-Shot) | 70.95 |
TruthfulQA (0-shot) | 45.86 |
Winogrande (5-shot) | 82.08 |
GSM8k (5-shot) | 51.71 |
- Downloads last month
- 10,781
Model tree for HIT-SCIR/Chinese-Mixtral-8x7B
Collection including HIT-SCIR/Chinese-Mixtral-8x7B
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set Open LLM Leaderboard63.570
- normalized accuracy on HellaSwag (10-Shot)validation set Open LLM Leaderboard85.980
- accuracy on MMLU (5-Shot)test set Open LLM Leaderboard70.950
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard45.860
- accuracy on Winogrande (5-shot)validation set Open LLM Leaderboard82.080
- accuracy on GSM8k (5-shot)test set Open LLM Leaderboard51.710