Update README.md
Browse files
README.md
CHANGED
@@ -31,7 +31,7 @@ model-index:
|
|
31 |
|
32 |
<p><h1> speechless-sparsetral-16x7b-MoE </h1></p>
|
33 |
|
34 |
-
speechless-sparsetral-16x7b-MoE is the MoE upgraded version of [speechless-code-mistral-7b-v1.0](https://huggingface.co/uukuguy/speechless-code-mistral-7b-v1.0). The MoE fine-tuning adopts [Parameter-Efficient Sparsity Crafting (PESC)](https://arxiv.org/abs/2401.02731), which is an efficient fine-tuning architecture that uses LoRA modules as expert models, similar to the concept of [multi-loras](https://github.com/uukuguy/multi_loras).
|
35 |
|
36 |
Specifically, Mistral-7B-0.1 is used as the base model, with 16 experts and 4 expert outputs selected for inference. The fine-tuning dataset includes codefuse-ai/Evol-Instruction-66k to enhance the model's code generation ability. The specific datasets are as follows:
|
37 |
|
|
|
31 |
|
32 |
<p><h1> speechless-sparsetral-16x7b-MoE </h1></p>
|
33 |
|
34 |
+
speechless-sparsetral-16x7b-MoE is the MoE upgraded version of [speechless-code-mistral-7b-v1.0](https://huggingface.co/uukuguy/speechless-code-mistral-7b-v1.0). The MoE fine-tuning adopts [Parameter-Efficient Sparsity Crafting (PESC)](https://arxiv.org/abs/2401.02731), which is an efficient fine-tuning architecture that uses LoRA modules as expert models, similar to the concept of [multi-loras](https://github.com/uukuguy/multi_loras). The model size is approximately **10B**.
|
35 |
|
36 |
Specifically, Mistral-7B-0.1 is used as the base model, with 16 experts and 4 expert outputs selected for inference. The fine-tuning dataset includes codefuse-ai/Evol-Instruction-66k to enhance the model's code generation ability. The specific datasets are as follows:
|
37 |
|