Text Generation
Transformers
Safetensors
qwen3_moe
programming
code generation
code
codeqwen
Mixture of Experts
coding
coder
qwen2
chat
qwen
qwen-coder
Qwen3-Coder-30B-A3B-Instruct
Qwen3-30B-A3B
mixture of experts
128 experts
8 active experts
512k context
qwen3
finetune
brainstorm 20x
brainstorm
optional thinking
conversational
Update README.md
Browse files
README.md
CHANGED
@@ -43,7 +43,7 @@ base_model:
|
|
43 |
pipeline_tag: text-generation
|
44 |
---
|
45 |
|
46 |
-
<h2>Qwen3-Coder-42B-A3B-Instruct-TOTAL-RECALL-MASTER-CODER-M [
|
47 |
|
48 |
<img src="qwen3-total-recall.gif" style="float:right; width:300px; height:300px; padding:10px;">
|
49 |
|
@@ -59,7 +59,7 @@ The Brainstorm adapter will improve general performance and "out of the box" thi
|
|
59 |
|
60 |
This creates a model of 42B parameters, 67 layers and 807 tensors.
|
61 |
|
62 |
-
This version has the NATIVE context
|
63 |
|
64 |
This is a non-reasoning/non-thinking block model.
|
65 |
|
|
|
43 |
pipeline_tag: text-generation
|
44 |
---
|
45 |
|
46 |
+
<h2>Qwen3-Coder-42B-A3B-Instruct-TOTAL-RECALL-MASTER-CODER-M-512k-ctx [512k context]</h2>
|
47 |
|
48 |
<img src="qwen3-total-recall.gif" style="float:right; width:300px; height:300px; padding:10px;">
|
49 |
|
|
|
59 |
|
60 |
This creates a model of 42B parameters, 67 layers and 807 tensors.
|
61 |
|
62 |
+
This version has the NATIVE context (up from 256k) set via yarn/rope to 512k as per Qwen tech notes.
|
63 |
|
64 |
This is a non-reasoning/non-thinking block model.
|
65 |
|