Qwen2.5 / Mistral- MOE - 1X7B// 1X32B // 2X7B // 4X7B // 6X7B // 8X7B // 2X24B -- Power-CODER --- 13.3B // 60B // 19B // 30B // 42B // 53B // 44B - gguf
This repo contains 34 different MOE (mixture of expert) (and a few NOT MOEs too) quanted models (at different parameter counts and different number of expert(s)) for coding and programming in multiple computer languages. All major (and many minor) coding languages are covered.
Parameter range: 13.3B to 60B ; with specific select quant(s) of each (until final testing and optimizations are complete).
These are mostly "Qwen2.5" MOE models, with a few Mistral MOE Coders (2X22B // 2x24B).
Although these models can be used for other use cases, the primary use case is "coding."
Qwen2.5 MOEs require a "shared expert", hence the larger/odd parameter count sizes for some models.
All models, except 1x32B (Qwen 2.5 arch), are based on 7B Qwen 2.5 coder models/fine tunes.
3 models are MISTRAL ( One with Devstral with instruct and one Devstral with Magistral (reasoning on/off) ) MOES at 2x24B AND 1 2x22B Codestral MOE, all models denoted by Mistral in the name - see "set #Mistral 1" info.
All models are carefully selected and tested "coder" instruct models (base, experts and shared expert).
Tool use is supported.
Models are fully stable, "Class 1" MOEs.
The 1x32B (with a shared expert) is in a class by itself, but still a Qwen2.5 type model.
In addition, some models in 6X7B and 8X7B versions ARE specialists.
All versions, except for "set 1" / "set 1.5" / "set 1.75", use the same base model for the base of the MOE model construction - at the moment. Different base(s) are being assessed for improving performance and/or creating specific use case model(s).
Please note the following:
- These are non reasoning models at the moment (except V2 Mistral: Devstral-Magistral MOE).
- More experts (and activated)/more models in the MOE generate stronger and stronger code.
- As many quants are below "full power" / not optimized, each model's performance will not be at peak power here... except "set 1.25".
- Full repo(s) will be generated, along with full quants after assessing/tuning/tweaking period.
Just added: [reverse order, in addition to "sets" below.]
Qwen2.5-7B-Nemo-OpenCoder-bs20-2-Q6_K.gguf [12B] [not a moe model]
- 7B Opencoder Nemo with Brainstorm 20x added [automatic reasoning].
Qwen2.5-7B-MS-Next-Coder-bs20-2-Q6_K.gguf [12B] [not a moe model]
- 7B Microsoft Next-Coder with Brainstorm 20x added.
Qwen2.5-14B-MS-Next-Coder-bs20-2-Q5_K_S.gguf [20B] [not a moe model]
- 14B Microsoft Next-Coder with Brainstorm 20x added.
Qwen2.5-32B-MS-Next-Coder-bs20-2-Q4_K_S.gguf [40B] [not a moe model]
- 32B Microsoft Next-Coder with Brainstorm 20x added.
Qwen3-OpenCoder-Multi-exp5-11-Q6_K.gguf [8.5B]
- Brainstorm 5x Matrix Mind Series with Nemo Opencoder 7B [main] + MSCoder 7b + Olympic 7b + Qwen Coder Instruct 7B
- Reasoning instruct coder generation.
Qwen3-OpenCoder-Multi-exp5-11-v2-Q6_K.gguf [8.5B] [not a moe model]
- Brainstorm 5x Matrix Mind Series with Nemo Opencoder 7B [main] + MSCoder 7b [2x] + Qwen Coder Instruct 7B
- Reasoning instruct coder generation.
Qwen3-OpenCoder-Multi-14B-exp5-11-Q6_K.gguf [16B] [not a moe model]
- Brainstorm 5x Matrix Mind Series with Nemo Opencoder 14B [main] + MSCoder 7b [2x] + Qwen Coder Instruct 14B
- Reasoning instruct coder generation.
Qwen3-OpenCoder-Multi-32B-exp5-11-Q4_K_S.gguf [35B] [not a moe model]
- Brainstorm 5x Matrix Mind Series with Nemo Opencoder 32B [main] + MSCoder 32b + Olympic 32b + Qwen Coder Instruct 32B
- Reasoning instruct coder generation.
Qwen3-OpenCoder-Multi-32B-exp5-11-V2-Q4_K_S.gguf [35B] [not a moe model]
- Brainstorm 5x Matrix Mind Series with Nemo Opencoder 32B [main] + MSCoder 32b [2x] + Qwen Coder Instruct 32B
- Reasoning instruct coder generation.
Qwen2.5-3X7B-QC-MS-OPY-Coder-V1-Q4_K_S.gguf
- 3x7B MOE, with MS Next Coder, Qwen 2.5 Coder Instruct, and Olympic Coder.
- Source is here: https://huggingface.co/DavidAU/Qwen2.5-3X7B-CoderInstruct-OlympicCoder-MS-Next-Coder-25B-v1
Qwen2.5-2X7B-NemoTron-TwoNemos-Coder-V1-Q4_K_S.gguf
- 2x7B MOE, Two OpenCoder Nemo Version 1 and 1.1 in a MOE config.
Mistral-2X24B-Power-Coder-Devs1.1-Q4_K_S.gguf
- 2x24B MOE Magistral with Devstral 1.1 in a MOE coder / all purpose with reasoning on/off.
- See "Set #Mistral 1" below for system prompt to turn reasoning on/off
Qwen2.5-2X7B-NemoTron-Coder-V2.9-0-Q6_K.gguf
- 2x7B MOE Opencoder Nemo Version 1.1 + additional coder model.
SET #1 - 5 versions - 19B - 2x7B MOE w shared expert:
Four versions at Q6, using various top Qwen 2.5 Coder Models in MOE config 2X7B + a shared expert => 19B.
Denoted by V1 to V4, in Q6 without "shared" in the filename.
(shared expert is different per model, as well as base model for these 4 versions which means the model is roughly 2.5X7B )
SET #1.25 - 5 Versions - 19B - 2x7b w shared expert:
These are denoted by "Qwen2.5-2X7B-Power-Coder-V2-50-Q6_K" with shared expert at 50% power.
This test group is also assessing quant optimization and tensor optimizations.
Also, there are versions: V2-50-Q6_K
- reg from bf16 source - this would be default Q6 settings, and default based on source from bf16.
- one made from float32 source ("f-32")
- one made float32 source + output tensor at bf16 ("opti"),
- one made from float 32 source + output tensor at bf16 + all shared expert tensors (3) (all layers) at bf16 ("opti2").
- one made from float 32 source + output tensor at f16 + all shared expert tensors (3) (all layers) at f16 ("OPTI2-f-16").
The last TWO are the most potent ones, and it will be the largest in size too, relative the other models in this set.
"f-16" (float 16 bit) was used for faster performance on CPU/MAC/METAL and/or using the model with partial offload from GPU. Float 16 is more cpu/metal/offload supported, and faster.
(shared expert is different per model, as well as base model for these 4 versions which means the model is roughly "2.5X7B" in terms of performance - but this varies.)
SET #1.5 - 3 versions - 13.3B - 1x7B MOE w shared expert:
Two versions at Q6, using ONE Qwen 2.5 Coder Model in MOE config 1X7B + a shared expert (a different model) at 100% => 13.3B.
Denoted by "shared100" (and 1x7B) in the filename.
This unusual config pairs a master expert, with 1 shared expert at 100%, creating a very strong coder at a smaller size, yet with MOST OF the power of a 2X7B MOE in a smaller package that is roughly 45% smaller (13.3B VS 19B).
A third model, a copy of model #2, with "shared" expert set at 50% to assess if this is better / reduces possible "overcooking".
The config in these models means the model is roughly 1.5x7B in terms of performance - but this varies.
Denoted by "shared50" in the filename.
SET #1.75 - 2 versions - 60B - 1x32B MOE w shared expert:
One version at Q3KS, using ONE Qwen 2.5 32B Coder Model in MOE config 1X32B + a shared expert (a different model - 32B ) at 100% => 60B.
One version at Q3KS, using ONE Qwen 2.5 32B Coder Model in MOE config 1X32B + a shared expert (a different model - 32B ) at 50% => 60B.
It contains the TWO strongest 32B Qwen 2.5 coder models.
Denoted by "shared100" (and 1x32B) in the filename. (v1)
Denoted by "shared50" (and 1x32B) in the filename. (v2)
This model is off the charts in terms of coding strength, detail, and other critical aspects of coding.
This unusual config pairs a master expert, with 1 shared expert at 100% / 50%, creating a very strong coder at a smaller size, yet with MOST OF the power of a 2X32B MOE in a smaller package that is roughly 30% smaller.
The config in these models means the model is roughly 1.5x32B in terms of performance - but this varies.
NOTE:
- A 2x32B (plus a shared expert) would be roughly 75-80B parameters.
- This is a very low quant size (one above Q2k - the lowest), higher quants will produce better results.
- V1 - Tests are ongoing assessing "shared expert" settings VS performance; 100% may not be the best setting.
- V2 - 50% shared expert ; testing pending.
SET #2: 1 version - 30B - 4x7B MOE w shared expert
One version is 4X7B (in filename) + 1 shared expert rendered in quant q4_K_S:
30B [roughly, shared expert adds more to it ; however compression makes this model smaller than it otherwise would be due to shared expert being common/base.]
SET #3: 1 version - 42B - 6x7B MOE w shared expert
One version is 6X7B (in filename) + 1 shared expert rendered in quant q4_K_S:
42B [roughly, shared expert adds more to it; however compression makes this model smaller than it otherwise would be due to shared expert being common/base. ]
SET #4: 1 version - 53B - 8x7B MOE w shared expert
One version is 8X7B (in filename) + 1 shared expert rendered in quant q4_K_S:
53B [roughly, shared expert adds more to it ; however compression makes this model smaller than it otherwise would be due to shared expert being common/base.]
Set #Mistral 1: 3 versions - 44B, 2x24B MOES
2 versions with "mistral" in the name, using Devstral as a base with one version with Instruct AND one version with Magistral (reasoning on/off).
For the latter, you need a system prompt as follows for reasoning (see below) ; note model will work fine without reasoning too.
Max context is 128k/131072 ; for reasoning (v2) strongly suggest min 8k context window, if reasoning is on.
REASONING SYSTEM PROMPT for V2 (optional):
A user will ask you to solve a task. You should first draft your thinking process (inner monologue) until you have derived the final answer. Afterwards, write a self-contained summary of your thoughts (i.e. your summary should be succinct but contain all the critical steps you needed to reach the conclusion). You should use Markdown and Latex to format your response. Write both your thoughts and summary in the same language as the task posed by the user.
Your thinking process must follow the template below:
<think>
Your thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate a correct answer.
</think>
Additional "Codestral" model 2X22B at Q4_K_S, this one requires CHATML template and has 32k context.
Use the "qwen" settings below for parameters.
First quants [2x24B] are q2k, so performance will be low relative to Q4, Q6 or Q8.
Additional [2x24B] V1 and V2 quant at q4_K_S with stronger performance.
Quants are unoptimized at the moment; optimizing will drastically improve performance.
GENERAL:
All versions have default of 2 experts activated except the "1x" models.
Number of active experts can be adjusted in Lmstudio and other AI Apps.
Experts can be set from 1 to 8 (depending on the model) ; better generations will occur with more experts activated due to MOE config as well as the fact these are all coder models.
Suggest 2-4 generations, especially if using 1 expert (all models).
Shared expert activated at 10% for all these models EXCEPT for "set 1" / "set 1.25" / "set 1.5" / "set 1.75".
Each version is different, not an "improvement" per say.
Models will accept "simple prompt" as well as very detailed instructions ; however for larger projects I suggest waiting for Q6/Q8 quants / optimized quants.
Evaluations are in progress.
Quants/models and structure(s) subject to change.
Full repo(s) per model(s) will be generated after eval period.
Feedback is welcome via "Community Tab".
Suggested Settings [Qwen 2.5 7B Coder default settings]:
- Temp .5 to .7 (or lower)
- Max Context is 32k (final versions will be 32k and 128k).
- topk: 20, topp: .8, minp: .05
- rep pen: 1.1 (can be lower)
- Jinja Template (embedded) or CHATML template.
- A System Prompt is not required. (ran tests with blank system prompt)
Help, Adjustments, Samplers, Parameters and More
CHANGE THE NUMBER OF ACTIVE EXPERTS:
See this document:
https://huggingface.co/DavidAU/How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts
Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:
In "KoboldCpp" or "oobabooga/text-generation-webui" or "Silly Tavern" ;
Set the "Smoothing_factor" to 1.5
: in KoboldCpp -> Settings->Samplers->Advanced-> "Smooth_F"
: in text-generation-webui -> parameters -> lower right.
: In Silly Tavern this is called: "Smoothing"
NOTE: For "text-generation-webui"
-> if using GGUFs you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model)
Source versions (and config files) of my models are here:
OTHER OPTIONS:
Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use "smoothing_factor")
If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing") just make the adjustment as noted.
Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers
This a "Class 1" model:
For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:
You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:
- Downloads last month
- 830
2-bit
3-bit
4-bit
5-bit
6-bit