uukuguy commited on
Commit
6e3b24c
·
verified ·
1 Parent(s): 1c7c968

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -31,7 +31,7 @@ model-index:
31
 
32
  <p><h1> speechless-sparsetral-16x7b-MoE </h1></p>
33
 
34
- speechless-sparsetral-16x7b-MoE is the MoE upgraded version of [speechless-code-mistral-7b-v1.0](https://huggingface.co/uukuguy/speechless-code-mistral-7b-v1.0). The MoE fine-tuning adopts [Parameter-Efficient Sparsity Crafting (PESC)](https://arxiv.org/abs/2401.02731), which is an efficient fine-tuning architecture that uses LoRA modules as expert models, similar to the concept of [multi-loras](https://github.com/uukuguy/multi_loras).
35
 
36
  Specifically, Mistral-7B-0.1 is used as the base model, with 16 experts and 4 expert outputs selected for inference. The fine-tuning dataset includes codefuse-ai/Evol-Instruction-66k to enhance the model's code generation ability. The specific datasets are as follows:
37
 
 
31
 
32
  <p><h1> speechless-sparsetral-16x7b-MoE </h1></p>
33
 
34
+ speechless-sparsetral-16x7b-MoE is the MoE upgraded version of [speechless-code-mistral-7b-v1.0](https://huggingface.co/uukuguy/speechless-code-mistral-7b-v1.0). The MoE fine-tuning adopts [Parameter-Efficient Sparsity Crafting (PESC)](https://arxiv.org/abs/2401.02731), which is an efficient fine-tuning architecture that uses LoRA modules as expert models, similar to the concept of [multi-loras](https://github.com/uukuguy/multi_loras). The model size is approximately **10B**.
35
 
36
  Specifically, Mistral-7B-0.1 is used as the base model, with 16 experts and 4 expert outputs selected for inference. The fine-tuning dataset includes codefuse-ai/Evol-Instruction-66k to enhance the model's code generation ability. The specific datasets are as follows:
37