|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
- zh |
|
base_model: |
|
- Kwai-Klear/Klear-Reasoner-8B |
|
- deepseek-ai/DeepSeek-R1-0528-Qwen3-8B |
|
- Qwen/Qwen3-8B |
|
- AXCXEPT/Qwen3-EZO-8B-beta |
|
- Qwen/Qwen3-8B-Base |
|
pipeline_tag: text-generation |
|
tags: |
|
- merge |
|
--- |
|
> *Enhance the performance of Qwen3-8B by merging powerful reasoning models without compromising the effectiveness of the \no_think tag!* |
|
# *Model Highlights:* |
|
|
|
- ***merge method**: `arcee_fusion` `nuslerp` `della`* |
|
|
|
- ***precision**: `dtype: float32` `out_dtype: bfloat16`* |
|
|
|
- ***Context length**: `40960`* |
|
|
|
# *Parameter Settings:* |
|
## *Thinking Mode:* |
|
> [!NOTE] |
|
> *`Temperature=0.6`, `TopP=0.95`, `TopK=20`,`MinP=0`.* |
|
## *Non-Thinking Mode:* |
|
> [!TIP] |
|
> *`Temperature=0.7`, `TopP=0.8`, `TopK=20`,`MinP=0`.* |
|
|
|
## *Step1: Merge Two Hybrid Models* |
|
- *Leverage the advantages of the two hybrid models.* |
|
```yaml |
|
models: |
|
- model: AXCXEPT/Qwen3-EZO-8B-beta |
|
merge_method: arcee_fusion |
|
base_model: Qwen/Qwen3-8B |
|
dtype: float32 |
|
out_dtype: bfloat16 |
|
tokenizer_source: base |
|
name: Qwen3-8B-fusion |
|
``` |
|
## *Step2: Merge High-Performance Reasoning Models with Hybrid Models* |
|
- *Maximize the proportion of reasoning models on the premise that the \no_think tag remains effective.* |
|
```yaml |
|
models: |
|
- model: Qwen3-8B-fusion |
|
parameters: |
|
weight: 0.55 |
|
- model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B |
|
parameters: |
|
weight: 0.45 |
|
merge_method: nuslerp |
|
tokenizer_source: Qwen/Qwen3-8B |
|
dtype: float32 |
|
out_dtype: bfloat16 |
|
name: Qwen3-8B-nuslerp1 |
|
``` |
|
```yaml |
|
models: |
|
- model: Qwen3-8B-fusion |
|
parameters: |
|
weight: 0.75 |
|
- model: Kwai-Klear/Klear-Reasoner-8B |
|
parameters: |
|
weight: 0.25 |
|
merge_method: nuslerp |
|
tokenizer_source: Qwen/Qwen3-8B |
|
dtype: float32 |
|
out_dtype: bfloat16 |
|
name: Qwen3-8B-nuslerp2 |
|
``` |
|
## *Step3: Unify the Enhanced Hybrid Modes* |
|
- *Merge the two models into the base model using the della merging method to make the model more versatile and stable.* |
|
- *We use the chat template of Qwen3-8B.* |
|
```yaml |
|
models: |
|
- model: Qwen3-8B-nuslerp1 |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
- model: Qwen3-8B-nuslerp2 |
|
parameters: |
|
density: 1 |
|
weight: 1 |
|
lambda: 0.9 |
|
merge_method: della |
|
base_model: Qwen/Qwen3-8B-Base |
|
dtype: float32 |
|
out_dtype: bfloat16 |
|
name: Qwen3-8B-YOYO-V2-Hybrid |
|
``` |
|
|
|
|
|
|