uukuguy
/

speechless-sparsetral-mistral-16x7b-MoE

Text Generation

Inference Endpoints

Model card Files Files and versions Community

uukuguy commited on Mar 18, 2024

Commit

6e3b24c

·

verified ·

1 Parent(s): 1c7c968

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -31,7 +31,7 @@ model-index:
 <p><h1> speechless-sparsetral-16x7b-MoE  </h1></p>
-speechless-sparsetral-16x7b-MoE is the MoE upgraded version of [speechless-code-mistral-7b-v1.0](https://huggingface.co/uukuguy/speechless-code-mistral-7b-v1.0). The MoE fine-tuning adopts [Parameter-Efficient Sparsity Crafting (PESC)](https://arxiv.org/abs/2401.02731), which is an efficient fine-tuning architecture that uses LoRA modules as expert models, similar to the concept of [multi-loras](https://github.com/uukuguy/multi_loras).
 Specifically, Mistral-7B-0.1 is used as the base model, with 16 experts and 4 expert outputs selected for inference. The fine-tuning dataset includes codefuse-ai/Evol-Instruction-66k to enhance the model's code generation ability. The specific datasets are as follows:

 <p><h1> speechless-sparsetral-16x7b-MoE  </h1></p>
+speechless-sparsetral-16x7b-MoE is the MoE upgraded version of [speechless-code-mistral-7b-v1.0](https://huggingface.co/uukuguy/speechless-code-mistral-7b-v1.0). The MoE fine-tuning adopts [Parameter-Efficient Sparsity Crafting (PESC)](https://arxiv.org/abs/2401.02731), which is an efficient fine-tuning architecture that uses LoRA modules as expert models, similar to the concept of [multi-loras](https://github.com/uukuguy/multi_loras). The model size is approximately **10B**.
 Specifically, Mistral-7B-0.1 is used as the base model, with 16 experts and 4 expert outputs selected for inference. The fine-tuning dataset includes codefuse-ai/Evol-Instruction-66k to enhance the model's code generation ability. The specific datasets are as follows: