YOYO-AI
/

Qwen3-30B-A3B-YOYO-V2

Text Generation

Model card Files Files and versions

YOYO-AI commited on 15 days ago

Commit

ae585a3

·

verified ·

1 Parent(s): 64ea5f3

Update README.md

Files changed (1) hide show

README.md +24 -0

README.md CHANGED Viewed

@@ -42,6 +42,7 @@ parameters:
   normalize: true
   int8_mask: true
 dtype: bfloat16
 ```
 ```yaml
 models:
@@ -57,4 +58,27 @@ parameters:
   normalize: true
   int8_mask: true
 dtype: bfloat16
 ```

   normalize: true
   int8_mask: true
 dtype: bfloat16
+name: Qwen3-30B-A3B-Coder-Instruct-nuslerp
 ```
 ```yaml
 models:
   normalize: true
   int8_mask: true
 dtype: bfloat16
+name: Qwen3-30B-A3B-Coder-Thinking-nuslerp
 ```
+## *Step2: Merge Code Instruction & Code Thinking Models into Base Model Together*
+- *Merge the two models into the base model using the della merging method to make the model more versatile and stable.*
+- *Since the merged model is more similar to the instruction model, we use the chat template of the Qwen3-30B-A3B-Instruct-2507.*
+```yaml
+models:
+  - model: Qwen3-30B-A3B-Coder-Instruct-nuslerp
+    parameters:
+      density: 1
+      weight: 1
+      lambda: 0.9
+  - model: Qwen3-30B-A3B-Coder-Thinking-nuslerp
+    parameters:
+      density: 1
+      weight: 1
+      lambda: 0.9
+merge_method: della
+base_model: Qwen/Qwen3-30B-A3B-Base
+dtype: bfloat16
+name: Qwen3-30B-A3B-YOYO-V2
+```
+## *Step3: Further Extend Context Length*
+- *By referring to the config_1m.json of Qwen3-30B-A3B-Instruct-2507, we modified the config.json of the merged model and extended the maximum context length to 1M.*