DavidAU/Qwen2.5-OpenCodeReasoning-Nemotron-1.1-7B-NEO-imatix-gguf

Qwen2.5-OpenCodeReasoning-Nemotron-1.1-NEO-imatix-7B-gguf

Specialized IQ3_M (for speed) and Q6 (for overall quality) NEO Imatrix with augments of this model (see benchmarks at this site, as well as more info):

https://huggingface.co/nvidia/OpenCodeReasoning-Nemotron-1.1-7B

These quants were designed to help reasoning, reduce "loops" and help model with finalization of "thoughts" and generate useable code.

The Q6_K is suggested for best quality, IQ3_M for raw speed // simpler coding task.

NEO Imatrix dataset also applied to improve general performance.

Output tensor set at BF16 (full precision, this tensor account for 10-20% of model performance) to further improve the model.

Settings Suggested:

Temp .5 to .7
rep pen 1.02 to 1.1
topk 20, topp .8, minp .05
Context size of 8k to 16k for "thinking".

Model Notes:

Jinja template (embedded) OR use "CHATML".
Model will SELF GENERATE thinking tags/blocks.
Max context 32k. (as per org model config ; can be raised using Yarn)

For better performance:

System prompt or prompt instructions to set thinking parameters.

As this is a coder model, including which language(s) to use, dependencies (or not), speed vs size and so on to help control this model's thinking block size(s).

This will FOCUS the model, and reduce thinking block sizes.

Help, Adjustments, Samplers, Parameters and More

Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:

In "KoboldCpp" or "oobabooga/text-generation-webui" or "Silly Tavern" ;

Set the "Smoothing_factor" to 1.5

: in KoboldCpp -> Settings->Samplers->Advanced-> "Smooth_F"

: in text-generation-webui -> parameters -> lower right.

: In Silly Tavern this is called: "Smoothing"

NOTE: For "text-generation-webui"

-> if using GGUFs you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model)

Source versions (and config files) of my models are here:

https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be

OTHER OPTIONS:

Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use "smoothing_factor")
If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing") just make the adjustment as noted.

Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers

This a "Class 1" model:

For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

Special Thanks:

Special thanks to all the following, and many more...

All the model makers, fine tuners, mergers, and tweakers:

Provides the raw "DNA" for almost all my models.
Sources of model(s) can be found on the repo pages, especially the "source" repos with link(s) to the model creator(s).

Huggingface [ https://huggingface.co ] :

The place to store, merge, and tune models endlessly.
THE reason we have an open source community.

LlamaCPP [ https://github.com/ggml-org/llama.cpp ] :

The ability to compress and run models on GPU(s), CPU(s) and almost all devices.
Imatrix, Quantization, and other tools to tune the quants and the models.
Llama-Server : A cli based direct interface to run GGUF models.
The only tool I use to quant models.

Quant-Masters: Team Mradermacher, Bartowski, and many others:

Quant models day and night for us all to use.
They are the lifeblood of open source access.

MergeKit [ https://github.com/arcee-ai/mergekit ] :

The universal online/offline tool to merge models together and forge something new.
Over 20 methods to almost instantly merge model, pull them apart and put them together again.
The tool I have used to create over 1500 models.

Lmstudio [ https://lmstudio.ai/ ] :

The go to tool to test and run models in GGUF format.
The Tool I use to test/refine and evaluate new models.
LMStudio forum on discord; endless info and community for open source.

Text Generation Webui // KolboldCPP // SillyTavern:

Excellent tools to run GGUF models with - [ https://github.com/oobabooga/text-generation-webui ] [ https://github.com/LostRuins/koboldcpp ] .
Sillytavern [ https://github.com/SillyTavern/SillyTavern ] can be used with LMSTudio [ https://lmstudio.ai/ ] , TextGen [ https://github.com/oobabooga/text-generation-webui ], Kolboldcpp [ https://github.com/LostRuins/koboldcpp ], Llama-Server [part of LLAMAcpp] as a off the scale front end control system and interface to work with models.

DavidAU
/

Qwen2.5-OpenCodeReasoning-Nemotron-1.1-7B-NEO-imatix-gguf

Qwen2.5-OpenCodeReasoning-Nemotron-1.1-NEO-imatix-7B-gguf

Help, Adjustments, Samplers, Parameters and More

Special Thanks:

Model tree for DavidAU/Qwen2.5-OpenCodeReasoning-Nemotron-1.1-7B-NEO-imatix-gguf

Collections including DavidAU/Qwen2.5-OpenCodeReasoning-Nemotron-1.1-7B-NEO-imatix-gguf

Thinking / Reasoning Models - Reg and MOEs.

95 Coder/Programming - MOE, Reasoning, Reg, Imatrix, Fused.