YOYO-AI commited on
Commit
ae585a3
·
verified ·
1 Parent(s): 64ea5f3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -42,6 +42,7 @@ parameters:
42
  normalize: true
43
  int8_mask: true
44
  dtype: bfloat16
 
45
  ```
46
  ```yaml
47
  models:
@@ -57,4 +58,27 @@ parameters:
57
  normalize: true
58
  int8_mask: true
59
  dtype: bfloat16
 
60
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  normalize: true
43
  int8_mask: true
44
  dtype: bfloat16
45
+ name: Qwen3-30B-A3B-Coder-Instruct-nuslerp
46
  ```
47
  ```yaml
48
  models:
 
58
  normalize: true
59
  int8_mask: true
60
  dtype: bfloat16
61
+ name: Qwen3-30B-A3B-Coder-Thinking-nuslerp
62
  ```
63
+ ## *Step2: Merge Code Instruction & Code Thinking Models into Base Model Together*
64
+ - *Merge the two models into the base model using the della merging method to make the model more versatile and stable.*
65
+ - *Since the merged model is more similar to the instruction model, we use the chat template of the Qwen3-30B-A3B-Instruct-2507.*
66
+ ```yaml
67
+ models:
68
+ - model: Qwen3-30B-A3B-Coder-Instruct-nuslerp
69
+ parameters:
70
+ density: 1
71
+ weight: 1
72
+ lambda: 0.9
73
+ - model: Qwen3-30B-A3B-Coder-Thinking-nuslerp
74
+ parameters:
75
+ density: 1
76
+ weight: 1
77
+ lambda: 0.9
78
+ merge_method: della
79
+ base_model: Qwen/Qwen3-30B-A3B-Base
80
+ dtype: bfloat16
81
+ name: Qwen3-30B-A3B-YOYO-V2
82
+ ```
83
+ ## *Step3: Further Extend Context Length*
84
+ - *By referring to the config_1m.json of Qwen3-30B-A3B-Instruct-2507, we modified the config.json of the merged model and extended the maximum context length to 1M.*