YOYO-AI commited on
Commit
910052a
·
verified ·
1 Parent(s): bb88ac1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -3
README.md CHANGED
@@ -1,3 +1,52 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ base_model:
7
+ - deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
8
+ - Qwen/Qwen3-8B
9
+ - Qwen/Qwen3-8B-Base
10
+ pipeline_tag: text-generation
11
+ tags:
12
+ - merge
13
+ ---
14
+
15
+ # *Model Highlights:*
16
+
17
+ - ***Optimal merge method**: `nuslerp`*
18
+
19
+ - ***Highest precision**: `dtype: float32` + `out_dtype: bfloat16`*
20
+
21
+ - ***Brand-new chat template**: ensures normal operation on LM Studio*
22
+
23
+ - ***Context length**: `32768` [128K context version](https://huggingface.co/YOYO-AI/Qwen3-8B-YOYO-nuslerp-128K)*
24
+
25
+ # *Parameter Settings*:
26
+ ## *Thinking Mode: (Recommend)*
27
+ > [!NOTE]
28
+ > *`Temperature=0.6`, `TopP=0.95`, `TopK=20`,`MinP=0`.*
29
+ ## *Non-thinking Mode: (Not recommend)*
30
+ *`\no_think` may not work sometimes*
31
+ > [!TIP]
32
+ > *`Temperature=0.7`, `TopP=0.8`, `TopK=20`,`MinP=0`.*
33
+ # *Configuration*:
34
+ *The following YAML configuration was used to produce this model:*
35
+
36
+ ```yaml
37
+ models:
38
+ - model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
39
+ parameters:
40
+ weight: 1
41
+ - model: Qwen/Qwen3-8B
42
+ parameters:
43
+ weight: 1
44
+ merge_method: nuslerp
45
+ base_model: Qwen/Qwen3-8B-Base
46
+ tokenizer_source: Qwen/Qwen3-8B
47
+ parameters:
48
+ normalize: true
49
+ int8_mask: true
50
+ dtype: float32
51
+ out_dtype: bfloat16
52
+ ```