YOYO-AI
/

Qwen3-30B-A3B-YOYO-V2

Text Generation

Model card Files Files and versions

Qwen3-30B-A3B-YOYO-V2 / README.md

YOYO-AI's picture

Update README.md

af14ff6 verified 12 days ago

|

history blame contribute delete

2.51 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	base_model:
	- Qwen/Qwen3-30B-A3B-Thinking-2507
	- Qwen/Qwen3-30B-A3B-Instruct-2507
	- Qwen/Qwen3-Coder-30B-A3B-Instruct
	- Qwen/Qwen3-30B-A3B-Base
	pipeline_tag: text-generation
	tags:
	- merge
	---
	> This is the initial unified version of the Qwen3-30B-A3B series models.As more fine-tuned models emerge and merging methods are applied, we will further improve it. Stay tuned!
	# Model Highlights:

	- *merge method: `nuslerp` `della`*

	- *precision: `dtype: bfloat16`*

	- *Context length: `1010000`*

	# Parameter Settings:
	> [!TIP]
	> `Temperature=0.7`, `TopP=0.8`, `TopK=20`,`MinP=0`.

	## Step1: Merge Code Model with Instruction & Thinking Models Separately
	- Adopt the nuslerp method to improve model absorption rate.
	- Set a merging ratio of 9:1 to prevent capability degradation caused by an excessively high proportion of the code model.
	```yaml
	models:
	- model: Qwen/Qwen3-30B-A3B-Instruct-2507
	parameters:
	weight: 0.9
	- model: Qwen/Qwen3-Coder-30B-A3B-Instruct
	parameters:
	weight: 0.1
	merge_method: nuslerp
	tokenizer_source: Qwen/Qwen3-30B-A3B-Instruct-2507
	parameters:
	normalize: true
	int8_mask: true
	dtype: bfloat16
	name: Qwen3-30B-A3B-Coder-Instruct-nuslerp
	```
	```yaml
	models:
	- model: Qwen/Qwen3-30B-A3B-Thinking-2507
	parameters:
	weight: 0.9
	- model: Qwen/Qwen3-Coder-30B-A3B-Instruct
	parameters:
	weight: 0.1
	merge_method: nuslerp
	tokenizer_source: Qwen/Qwen3-30B-A3B-Thinking-2507
	parameters:
	normalize: true
	int8_mask: true
	dtype: bfloat16
	name: Qwen3-30B-A3B-Coder-Thinking-nuslerp
	```
	## Step2: Merge Code Instruction & Code Thinking Models into Base Model Together
	- Merge the two models into the base model using the della merging method to make the model more versatile and stable.
	- Since the merged model is more similar to the instruction model, we use the chat template of the Qwen3-30B-A3B-Instruct-2507.
	```yaml
	models:
	- model: Qwen3-30B-A3B-Coder-Instruct-nuslerp
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	- model: Qwen3-30B-A3B-Coder-Thinking-nuslerp
	parameters:
	density: 1
	weight: 1
	lambda: 0.9
	merge_method: della
	base_model: Qwen/Qwen3-30B-A3B-Base
	dtype: bfloat16
	name: Qwen3-30B-A3B-YOYO-V2
	```
	## Step3: Further Extend Context Length
	- By referring to the config_1m.json of Qwen3-30B-A3B-Instruct-2507, we modified the config.json of the merged model and extended the maximum context length to 1M.