Qwen2.5-OpenCodeReasoning-Nemotron-1.1-NEO-imatix-7B-gguf

Specialized IQ3_M (for speed) and Q6 (for overall quality) NEO Imatrix with augments of this model (see benchmarks at this site, as well as more info):

https://huggingface.co/nvidia/OpenCodeReasoning-Nemotron-1.1-7B

These quants were designed to help reasoning, reduce "loops" and help model with finalization of "thoughts" and generate useable code.

The Q6_K is suggested for best quality, IQ3_M for raw speed // simpler coding task.

NEO Imatrix dataset also applied to improve general performance.

Output tensor set at BF16 (full precision, this tensor account for 10-20% of model performance) to further improve the model.

Settings Suggested:

  • Temp .5 to .7
  • rep pen 1.02 to 1.1
  • topk 20, topp .8, minp .05
  • Context size of 8k to 16k for "thinking".

Model Notes:

  • Jinja template (embedded) OR use "CHATML".
  • Model will SELF GENERATE thinking tags/blocks.
  • Max context 32k. (as per org model config ; can be raised using Yarn)

For better performance:

System prompt or prompt instructions to set thinking parameters.

As this is a coder model, including which language(s) to use, dependencies (or not), speed vs size and so on to help control this model's thinking block size(s).

This will FOCUS the model, and reduce thinking block sizes.


Help, Adjustments, Samplers, Parameters and More


Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:

In "KoboldCpp" or "oobabooga/text-generation-webui" or "Silly Tavern" ;

Set the "Smoothing_factor" to 1.5

: in KoboldCpp -> Settings->Samplers->Advanced-> "Smooth_F"

: in text-generation-webui -> parameters -> lower right.

: In Silly Tavern this is called: "Smoothing"

NOTE: For "text-generation-webui"

-> if using GGUFs you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model)

Source versions (and config files) of my models are here:

https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be

OTHER OPTIONS:

  • Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use "smoothing_factor")

  • If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing") just make the adjustment as noted.

Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers

This a "Class 1" model:

For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]


Special Thanks:


Special thanks to all the following, and many more...

All the model makers, fine tuners, mergers, and tweakers:

  • Provides the raw "DNA" for almost all my models.
  • Sources of model(s) can be found on the repo pages, especially the "source" repos with link(s) to the model creator(s).

Huggingface [ https://huggingface.co ] :

  • The place to store, merge, and tune models endlessly.
  • THE reason we have an open source community.

LlamaCPP [ https://github.com/ggml-org/llama.cpp ] :

  • The ability to compress and run models on GPU(s), CPU(s) and almost all devices.
  • Imatrix, Quantization, and other tools to tune the quants and the models.
  • Llama-Server : A cli based direct interface to run GGUF models.
  • The only tool I use to quant models.

Quant-Masters: Team Mradermacher, Bartowski, and many others:

  • Quant models day and night for us all to use.
  • They are the lifeblood of open source access.

MergeKit [ https://github.com/arcee-ai/mergekit ] :

  • The universal online/offline tool to merge models together and forge something new.
  • Over 20 methods to almost instantly merge model, pull them apart and put them together again.
  • The tool I have used to create over 1500 models.

Lmstudio [ https://lmstudio.ai/ ] :

  • The go to tool to test and run models in GGUF format.
  • The Tool I use to test/refine and evaluate new models.
  • LMStudio forum on discord; endless info and community for open source.

Text Generation Webui // KolboldCPP // SillyTavern:

Downloads last month
1,122
GGUF
Model size
7.62B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

3-bit

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DavidAU/Qwen2.5-OpenCodeReasoning-Nemotron-1.1-7B-NEO-imatix-gguf

Base model

Qwen/Qwen2.5-7B
Quantized
(12)
this model

Collections including DavidAU/Qwen2.5-OpenCodeReasoning-Nemotron-1.1-7B-NEO-imatix-gguf