YOYO-AI commited on
Commit
bc445cd
·
verified ·
1 Parent(s): e88af64

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -3
README.md CHANGED
@@ -1,3 +1,97 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ base_model:
7
+ - Kwai-Klear/Klear-Reasoner-8B
8
+ - deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
9
+ - Qwen/Qwen3-8B
10
+ - AXCXEPT/Qwen3-EZO-8B-beta
11
+ - Qwen/Qwen3-8B-Base
12
+ pipeline_tag: text-generation
13
+ tags:
14
+ - merge
15
+ ---
16
+ > *Enhance the performance of Qwen3-8B by merging powerful reasoning models without compromising the effectiveness of the \no_think tag!*
17
+ # *Model Highlights:*
18
+
19
+ - ***merge method**: `arcee_fusion``nuslerp` `della`*
20
+
21
+ - ***precision**: `dtype: float32``out_dtype: bfloat16`*
22
+
23
+ - ***Context length**: `40960`*
24
+
25
+ # *Parameter Settings:*
26
+ ## *Thinking Mode:*
27
+ > [!NOTE]
28
+ > *`Temperature=0.6`, `TopP=0.95`, `TopK=20`,`MinP=0`.*
29
+ ## *Non-thinking Mode:*
30
+ > [!TIP]
31
+ > *`Temperature=0.7`, `TopP=0.8`, `TopK=20`,`MinP=0`.*
32
+
33
+ ## *Step1: Merge Two Hybrid Models*
34
+ - *Leverage the advantages of the two hybrid models.*
35
+ ```yaml
36
+ models:
37
+ - model: AXCXEPT/Qwen3-EZO-8B-beta
38
+ merge_method: arcee_fusion
39
+ base_model: Qwen/Qwen3-8B
40
+ dtype: float32
41
+ out_dtype: bfloat16
42
+ tokenizer_source: base
43
+ name: Qwen3-8B-fusion
44
+ ```
45
+ ## *Step2: Merge High-Performance Reasoning Models with Hybrid Models*
46
+ - *Maximize the proportion of reasoning models and output the merged model before the \no_think tag becomes ineffective.*
47
+ ```yaml
48
+ models:
49
+ - model: Qwen/Qwen3-8B
50
+ parameters:
51
+ weight: 0.55
52
+ - model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
53
+ parameters:
54
+ weight: 0.45
55
+ merge_method: nuslerp
56
+ tokenizer_source: Qwen/Qwen3-8B
57
+ dtype: float32
58
+ out_dtype: bfloat16
59
+ name: Qwen3-8B-nuslerp1
60
+ ```
61
+ ```yaml
62
+ models:
63
+ - model: Qwen/Qwen3-8B
64
+ parameters:
65
+ weight: 0.75
66
+ - model: Kwai-Klear/Klear-Reasoner-8B
67
+ parameters:
68
+ weight: 0.25
69
+ merge_method: nuslerp
70
+ tokenizer_source: Qwen/Qwen3-8B
71
+ dtype: float32
72
+ out_dtype: bfloat16
73
+ name: Qwen3-8B-nuslerp2
74
+ ```
75
+ ## *Step3: Unify the Enhanced Hybrid Modes*
76
+ - *Merge the two models into the base model using the della merging method to make the model more versatile and stable.*
77
+ - *We use the chat template of Qwen3-8B.*
78
+ ```yaml
79
+ models:
80
+ - model: Qwen3-8B-nuslerp1
81
+ parameters:
82
+ density: 1
83
+ weight: 1
84
+ lambda: 0.9
85
+ - model: Qwen3-8B-nuslerp2
86
+ parameters:
87
+ density: 1
88
+ weight: 1
89
+ lambda: 0.9
90
+ merge_method: della
91
+ base_model: Qwen/Qwen3-8B-Base
92
+ dtype: float32
93
+ out_dtype: bfloat16
94
+ name: Qwen3-8B-YOYO-V2-Hybrid
95
+ ```
96
+
97
+