YOYO-AI
/

Qwen3-8B-YOYO-V2-Hybrid

Text Generation

Model card Files Files and versions

Qwen3-8B-YOYO-V2-Hybrid / README.md

YOYO-AI's picture

Update README.md

a7bda52 verified 8 days ago

|

history blame contribute delete

2.29 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	base_model:
	- Kwai-Klear/Klear-Reasoner-8B
	- deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
	- Qwen/Qwen3-8B
	- AXCXEPT/Qwen3-EZO-8B-beta
	- Qwen/Qwen3-8B-Base
	pipeline_tag: text-generation
	tags:
	- merge
	---
	> Enhance the performance of Qwen3-8B by merging powerful reasoning models without compromising the effectiveness of the \no_think tag!
	# Model Highlights:

	- *merge method: `arcee_fusion` `nuslerp` `della`*

	- *precision: `dtype: float32` `out_dtype: bfloat16`*

	- *Context length: `40960`*

	# Parameter Settings:
	## Thinking Mode:
	> [!NOTE]
	> `Temperature=0.6`, `TopP=0.95`, `TopK=20`,`MinP=0`.
	## Non-Thinking Mode:
	> [!TIP]
	> `Temperature=0.7`, `TopP=0.8`, `TopK=20`,`MinP=0`.

	## Step1: Merge Two Hybrid Models
	- Leverage the advantages of the two hybrid models.
	```yaml
	models:
	- model: AXCXEPT/Qwen3-EZO-8B-beta
	merge_method: arcee_fusion
	base_model: Qwen/Qwen3-8B
	dtype: float32
	out_dtype: bfloat16
	tokenizer_source: base
	name: Qwen3-8B-fusion
	```
	## Step2: Merge High-Performance Reasoning Models with Hybrid Models
	- Maximize the proportion of reasoning models on the premise that the \no_think tag remains effective.
	```yaml
	models:
	- model: Qwen3-8B-fusion
	parameters:
	weight: 0.55
	- model: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
	parameters:
	weight: 0.45
	merge_method: nuslerp
	tokenizer_source: Qwen/Qwen3-8B
	dtype: float32
	out_dtype: bfloat16
	name: Qwen3-8B-nuslerp1
	```
	```yaml
	models:
	- model: Qwen3-8B-fusion
	parameters:
	weight: 0.75
	- model: Kwai-Klear/Klear-Reasoner-8B
	parameters:
	weight: 0.25
	merge_method: nuslerp
	tokenizer_source: Qwen/Qwen3-8B
	dtype: float32
	out_dtype: bfloat16
	name: Qwen3-8B-nuslerp2
	```
	## Step3: Unify the Enhanced Hybrid Modes
	- Merge the two models into the base model using the della merging method to make the model more versatile and stable.
	- We use the chat template of Qwen3-8B.
	```yaml
	models:
	- model: Qwen3-8B-nuslerp1
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	- model: Qwen3-8B-nuslerp2
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	merge_method: della
	base_model: Qwen/Qwen3-8B-Base
	dtype: float32
	out_dtype: bfloat16
	name: Qwen3-8B-YOYO-V2-Hybrid
	```