--- license: apache-2.0 language: - en - zh base_model: - Kwai-Klear/Klear-Reasoner-8B - deepseek-ai/DeepSeek-R1-0528-Qwen3-8B - Qwen/Qwen3-8B - AXCXEPT/Qwen3-EZO-8B-beta - Qwen/Qwen3-8B-Base pipeline_tag: text-generation tags: - merge --- > *Enhance the performance of Qwen3-8B by merging powerful reasoning models without compromising the effectiveness of the \no_think tag!* # *Model Highlights:* - ***merge method**: `arcee_fusion` `nuslerp` `della`* - ***precision**: `dtype: float32` `out_dtype: bfloat16`* - ***Context length**: `40960`* # *Parameter Settings:* ## *Thinking Mode:* > [!NOTE] > *`Temperature=0.6`, `TopP=0.95`, `TopK=20`,`MinP=0`.* ## *Non-Thinking Mode:* > [!TIP] > *`Temperature=0.7`, `TopP=0.8`, `TopK=20`,`MinP=0`.* ## *Step1: Merge Two Hybrid Models* - *Leverage the advantages of the two hybrid models.* ```yaml models: - model: AXCXEPT/Qwen3-EZO-8B-beta merge_method: arcee_fusion base_model: Qwen/Qwen3-8B dtype: float32 out_dtype: bfloat16 tokenizer_source: base name: Qwen3-8B-fusion ``` ## *Step2: Merge High-Performance Reasoning Models with Hybrid Models* - *Maximize the proportion of reasoning models on the premise that the \no_think tag remains effective.* ```yaml models: - model: Qwen3-8B-fusion parameters: weight: 0.55 - model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B parameters: weight: 0.45 merge_method: nuslerp tokenizer_source: Qwen/Qwen3-8B dtype: float32 out_dtype: bfloat16 name: Qwen3-8B-nuslerp1 ``` ```yaml models: - model: Qwen3-8B-fusion parameters: weight: 0.75 - model: Kwai-Klear/Klear-Reasoner-8B parameters: weight: 0.25 merge_method: nuslerp tokenizer_source: Qwen/Qwen3-8B dtype: float32 out_dtype: bfloat16 name: Qwen3-8B-nuslerp2 ``` ## *Step3: Unify the Enhanced Hybrid Modes* - *Merge the two models into the base model using the della merging method to make the model more versatile and stable.* - *We use the chat template of Qwen3-8B.* ```yaml models: - model: Qwen3-8B-nuslerp1 parameters: density: 1 weight: 1 lambda: 0.9 - model: Qwen3-8B-nuslerp2 parameters: density: 1 weight: 1 lambda: 0.9 merge_method: della base_model: Qwen/Qwen3-8B-Base dtype: float32 out_dtype: bfloat16 name: Qwen3-8B-YOYO-V2-Hybrid ```