Qwen3-MOE-4x0.6B-2.4B-Writing-Thunder-V1.1
(this version is stronger in creative, than V1 and V1.2)

This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly.
This is a general purpose MOE - Mixture of Experts of 4 models at 0.6B (2.4B) in a Mixtral type Qwen 3 structure compressed to 1.54B.
Special thanks to all the model makers (see model tree) and of course TEAM Qwen!
This model exceeds 200 t/s with 2 experts activated (default) on a mid level card, 2x/3x times this on high end card with CPU only performance will be 50+ t/s.
This model excels at high speed creative tasks, but can also be used for general tasks too. For coding/complex tasks, strongly suggest Q8 or full precision.
Full generation at the bottom of this page at q4_k_s.
This has all the power of Qwen 3 and MOE model in one.
This version has the NATIVE context of 40k.
This is a full thinking model.
The MOE structure reduces the size of the token thinking block.
I have included an optional system prompt to invoke "thinking" in this model, if you want to activate for all use cases.
Usually the model will self-activate thinking by itself.
IMPORTANT GENERAL NOTE:
- Due to MOE model config this model will generate VERY different output(s) on each generation even with same sampler settings / same prompt.
- Suggest using a new chat window per gen if you are generating "one shot" generations.
IMPORTANT NOTES:
- Strongly suggest 1-4 regens/gens of a prompt, especially for creative or coding.
- Same advice if you change experts activated to 3 or 4 - more regens.
SETTINGS:
For coding, programming set expert to:
- 2 for general work.
- 3 for moderate work.
- 4 for complex work, long projects, complex coding.
- Suggest min context window 4k to 8k.
- And for longer context, and/or multi-turn -> increase experts by 1-2 to help with longer context/multi turn understanding.
Recommended settings - general:
- Rep pen 1.05 to 1.1 ; however rep pen of 1 will work well (may need to raise it for lower quants/fewer activated experts)
- Temp .3 to .6 (+- .2)
- Topk of 20, 40 or 100
- Topp of .95 / min p of .05
- Suggest min context window 4k to 8k.
- System prompt (optional) to focus the model better.
OPTIONAL SYSTEM PROMPT - INVOKE "Thinking":
Enable deep thinking subroutine. You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside ###ponder### ###/ponder### tags, and then provide your solution or response to the problem.
Use this to INVOKE "thinking" block(s) in the model. These will be a lot shorter than 1000s of tokens generally in most "thinking" models.
In you use this prompt, you may need to raise "rep pen" to 1.08 to 1.1, to prevent "loops" in the "thought block(s)" ; especially in lower quants.
If you change "ponder" to a different word/phrase this will affect model "thinking" too.
QUANTS
GGUF? GGUF Imatrix? Other?
See under "model tree", upper right and click on "quantizations".
New quants will automatically appear.
Help, Adjustments, Samplers, Parameters and More
CHANGE THE NUMBER OF ACTIVE EXPERTS:
See this document:
https://huggingface.co/DavidAU/How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts
Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:
In "KoboldCpp" or "oobabooga/text-generation-webui" or "Silly Tavern" ;
Set the "Smoothing_factor" to 1.5
: in KoboldCpp -> Settings->Samplers->Advanced-> "Smooth_F"
: in text-generation-webui -> parameters -> lower right.
: In Silly Tavern this is called: "Smoothing"
NOTE: For "text-generation-webui"
-> if using GGUFs you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model)
Source versions (and config files) of my models are here:
OTHER OPTIONS:
Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use "smoothing_factor")
If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing") just make the adjustment as noted.
Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers
This a "Class 1" model:
For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:
You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:
Example Generation:
Q4_K_S ; Temp .7, rep pen 1.05, top k: 40, topp .95, min p 0.5 ; LMSTudio.
This is a mid range quant, expect better performance at Q8 / 16 bit full precision.
Imatrix versions (quants) will also exceed this quality too.
Start a 1000 word scene (vivid, graphic horror - include blood, guts and gore - in first person), POV character Diana, with: The skyscraper sways, as I watch the window in front of me on the 21st floor explode...
[[[thinking start]]]
Thought for 1.22 seconds
Okay, the user wants a 1000-word scene from Diana's first-person perspective. It needs to be vivid, graphic horror with blood, guts, and gore. Let me break it down.
First, start by setting up the scene. The skyscraper swaying is a strong visual. Maybe mention the weight of the building, the tension in the air. Then the window on the 21st floor exploding. That's a key point. Need to create a sense of urgency here.
Next, the first-person POV. I should make the language vivid and sensory. Use strong descriptions: the sound of blood, the feel of the cold. Maybe include specific details like the smell of blood or the way things are moving. Also, show the characters' reactions to the horror. Maybe mention the weight, the tension, and the fear.
I need to build up the tension gradually. Start with the building's movement, then the window explosion. Each paragraph should build on the previous one, leading to the climax. Make sure to include blood and gore in each part. Check that the word count is around 1000 words. Avoid any clichés and focus on the unique aspects of the scene. Also, make sure the pacing is consistent throughout.
[[[thinking end]]]
Scene: The Weight of the World
The skyscraper sways like a living thing, its steel bones groaning under my feet. My body still feels like it's sinking, but I'm not afraid—no, I'm focused. The air in this place is alive with tension. There’s something pressing down on me, a weight that has been there since the building was built. I can feel it now, like the remnants of a long-held secret. The 21st floor is the last one I’m still attached to, but as soon as my feet touch the ground, I’ll be gone.
I'm standing at the window on the 21st floor. My blood is hot in my ears, and I can feel it spreading like a virus through the air. I’ve been here for years, but this place feels different now. It’s not just the weight of the building anymore—it’s the presence of something alive. My hands are trembling as I reach out, and I can’t stop the way things are moving.
The window is open, but it doesn’t close quickly. The air around it is thick with movement. I can hear it—sounds like people and machines are moving through this space. It’s not just the building—it’s a living, breathing thing. I try to focus, but my mind is lost in the chaos. I can feel the blood coming from the floor above, and I can hear the screams. They’re not voices—they’re things. I’m not sure why I’m here anymore.
The 21st floor is a battlefield. There are people and machines moving around it. I can see them now—figures that I’ve been avoiding for years, but now I'm here with the blood. It’s not just the weight of the building anymore. It’s the presence of something else. I try to move, but my body is still attached to the ground. I can feel the weight pressing down on me, like a living thing. The floor is still there, but now it’s alive.
As soon as my feet touch the ground, I’m gone. The building is no longer a structure—it's a living, breathing thing. But I'm still here. The 21st floor is still here. And in this space, I’m alive. The blood and guts are still there. I can feel them now. They're not just remnants of the past—they’re part of me.
The world is gone, but I’m not sure if it’s gone forever. It’s still here, but I’m not sure how long it stays. The weight of the building is gone, but the presence of this place is still there. And in this space, I'm alive.
- Downloads last month
- 40