YOYO-AI
/

Qwen3-30B-A3B-YOYO-V2

Text Generation

Model card Files Files and versions

This is the initial unified version of the Qwen3-30B-A3B series models.As more fine-tuned models emerge and merging methods are applied, we will further improve it. Stay tuned!

Model Highlights:

merge method: nuslerp della
precision: dtype: bfloat16
Context length: 1010000

Parameter Settings:

Temperature=0.7, TopP=0.8, TopK=20,MinP=0.

Step1: Merge Code Model with Instruction & Thinking Models Separately

Adopt the nuslerp method to improve model absorption rate.
Set a merging ratio of 9:1 to prevent capability degradation caused by an excessively high proportion of the code model.

models:
  - model: Qwen/Qwen3-30B-A3B-Instruct-2507
    parameters:
      weight: 0.9
  - model: Qwen/Qwen3-Coder-30B-A3B-Instruct
    parameters:
      weight: 0.1
merge_method: nuslerp
tokenizer_source: Qwen/Qwen3-30B-A3B-Instruct-2507
parameters:
  normalize: true
  int8_mask: true
dtype: bfloat16
name: Qwen3-30B-A3B-Coder-Instruct-nuslerp

models:
  - model: Qwen/Qwen3-30B-A3B-Thinking-2507
    parameters:
      weight: 0.9
  - model: Qwen/Qwen3-Coder-30B-A3B-Instruct
    parameters:
      weight: 0.1
merge_method: nuslerp
tokenizer_source: Qwen/Qwen3-30B-A3B-Thinking-2507
parameters:
  normalize: true
  int8_mask: true
dtype: bfloat16
name: Qwen3-30B-A3B-Coder-Thinking-nuslerp

Step2: Merge Code Instruction & Code Thinking Models into Base Model Together

Merge the two models into the base model using the della merging method to make the model more versatile and stable.
Since the merged model is more similar to the instruction model, we use the chat template of the Qwen3-30B-A3B-Instruct-2507.

models:
  - model: Qwen3-30B-A3B-Coder-Instruct-nuslerp
    parameters:
      density: 1
      weight: 1
      lambda: 0.9
  - model: Qwen3-30B-A3B-Coder-Thinking-nuslerp
    parameters:
      density: 1
      weight: 1
      lambda: 0.9
merge_method: della
base_model: Qwen/Qwen3-30B-A3B-Base
dtype: bfloat16
name: Qwen3-30B-A3B-YOYO-V2

Step3: Further Extend Context Length

By referring to the config_1m.json of Qwen3-30B-A3B-Instruct-2507, we modified the config.json of the merged model and extended the maximum context length to 1M.

Downloads last month: 18

Safetensors

Model size

30.5B params

Tensor type

BF16

·

Model tree for YOYO-AI/Qwen3-30B-A3B-YOYO-V2

Qwen/Qwen3-30B-A3B-Base

Qwen/Qwen3-30B-A3B-Instruct-2507

Qwen/Qwen3-30B-A3B-Thinking-2507

Qwen/Qwen3-Coder-30B-A3B-Instruct

Merge model

this model

Finetunes

Quantizations

Collection including YOYO-AI/Qwen3-30B-A3B-YOYO-V2

Qwen3-YOYO

7 items • Updated 6 days ago • 2