Text Generation
GGUF
English
conversational
gabriellarson commited on
Commit
36e750d
·
verified ·
1 Parent(s): 5ed02fd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -0
README.md CHANGED
@@ -8,6 +8,11 @@ pipeline_tag: text-generation
8
  base_model:
9
  - microsoft/Phi-mini-MoE-instruct
10
  ---
 
 
 
 
 
11
  ## Model Summary
12
 
13
  Phi-mini-MoE is a lightweight Mixture of Experts (MoE) model with 7.6B total parameters and 2.4B activated parameters. It is compressed and distilled from the base model shared by [Phi-3.5-MoE](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) and [GRIN-MoE](https://huggingface.co/microsoft/GRIN-MoE) using the [SlimMoE](https://arxiv.org/pdf/2506.18349) approach, then post-trained via supervised fine-tuning and direct preference optimization for instruction following and safety. The model is trained on Phi-3 synthetic data and filtered public documents, with a focus on high-quality, reasoning-dense content. It is part of the SlimMoE series, which includes a smaller variant, [Phi-tiny-MoE](https://huggingface.co/microsoft/Phi-tiny-MoE-instruct), with 3.8B total and 1.1B activated parameters.
 
8
  base_model:
9
  - microsoft/Phi-mini-MoE-instruct
10
  ---
11
+
12
+ ## my suggested samplers:
13
+
14
+ --repeat-penalty 1.05 --temp 0.0 --top-p 1.0 --top-k 1
15
+
16
  ## Model Summary
17
 
18
  Phi-mini-MoE is a lightweight Mixture of Experts (MoE) model with 7.6B total parameters and 2.4B activated parameters. It is compressed and distilled from the base model shared by [Phi-3.5-MoE](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) and [GRIN-MoE](https://huggingface.co/microsoft/GRIN-MoE) using the [SlimMoE](https://arxiv.org/pdf/2506.18349) approach, then post-trained via supervised fine-tuning and direct preference optimization for instruction following and safety. The model is trained on Phi-3 synthetic data and filtered public documents, with a focus on high-quality, reasoning-dense content. It is part of the SlimMoE series, which includes a smaller variant, [Phi-tiny-MoE](https://huggingface.co/microsoft/Phi-tiny-MoE-instruct), with 3.8B total and 1.1B activated parameters.