Update README.md
Browse files
README.md
CHANGED
@@ -42,6 +42,7 @@ parameters:
|
|
42 |
normalize: true
|
43 |
int8_mask: true
|
44 |
dtype: bfloat16
|
|
|
45 |
```
|
46 |
```yaml
|
47 |
models:
|
@@ -57,4 +58,27 @@ parameters:
|
|
57 |
normalize: true
|
58 |
int8_mask: true
|
59 |
dtype: bfloat16
|
|
|
60 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
normalize: true
|
43 |
int8_mask: true
|
44 |
dtype: bfloat16
|
45 |
+
name: Qwen3-30B-A3B-Coder-Instruct-nuslerp
|
46 |
```
|
47 |
```yaml
|
48 |
models:
|
|
|
58 |
normalize: true
|
59 |
int8_mask: true
|
60 |
dtype: bfloat16
|
61 |
+
name: Qwen3-30B-A3B-Coder-Thinking-nuslerp
|
62 |
```
|
63 |
+
## *Step2: Merge Code Instruction & Code Thinking Models into Base Model Together*
|
64 |
+
- *Merge the two models into the base model using the della merging method to make the model more versatile and stable.*
|
65 |
+
- *Since the merged model is more similar to the instruction model, we use the chat template of the Qwen3-30B-A3B-Instruct-2507.*
|
66 |
+
```yaml
|
67 |
+
models:
|
68 |
+
- model: Qwen3-30B-A3B-Coder-Instruct-nuslerp
|
69 |
+
parameters:
|
70 |
+
density: 1
|
71 |
+
weight: 1
|
72 |
+
lambda: 0.9
|
73 |
+
- model: Qwen3-30B-A3B-Coder-Thinking-nuslerp
|
74 |
+
parameters:
|
75 |
+
density: 1
|
76 |
+
weight: 1
|
77 |
+
lambda: 0.9
|
78 |
+
merge_method: della
|
79 |
+
base_model: Qwen/Qwen3-30B-A3B-Base
|
80 |
+
dtype: bfloat16
|
81 |
+
name: Qwen3-30B-A3B-YOYO-V2
|
82 |
+
```
|
83 |
+
## *Step3: Further Extend Context Length*
|
84 |
+
- *By referring to the config_1m.json of Qwen3-30B-A3B-Instruct-2507, we modified the config.json of the merged model and extended the maximum context length to 1M.*
|