(additional quants to upload shortly, including new DI-Matrix/TRI-Matrix // examples to be added)

Specialized "light" uncensored quants for new OpenAI 20B MOE - Mixture of Experts Model at 80+ T/S. See settings and special instructions for using this model below.

OpenAi-GPT-oss-20b-LIGHT-uncensored-NEO-Imatrix-gguf

These are NEO Imatrix GGUFs, NEO dataset by DavidAU.

NEO dataset improves overall performance, and is for all use cases.

This model uses "huizimao/gpt-oss-20b-uncensored-mxfp4" (Light, 22% refusal rate VS 77% of Org OpenAI 20B using same content/prompt) as a base which DE-CENSORS the model and removes refusals.

This model runs better than the full abliterated/uncensored and "moderate" uncensored version and accepts MOST content generation requests.

The goal is to temper the "nanny" during normal generation / general use cases.

It is the best balance between light refusals "repairs" and best model performance.

NOTE: Tool use re-enabled in this version ; which differs from source from "huizimao".

Example output below (creative; IQ4_NL), using settings below.

Looking for 100% uncensored/abliterated?

https://huggingface.co/DavidAU/OpenAi-GPT-oss-20b-abliterated-uncensored-NEO-Imatrix-gguf

Moderate uncensored ?

https://huggingface.co/DavidAU/OpenAi-GPT-oss-20b-MODERATE-uncensored-NEO-Imatrix-gguf

If you do not need an "uncensored" / "abliterated" model (at this repo) please go here:

https://huggingface.co/DavidAU/Openai_gpt-oss-20b-NEO-GGUF

or for the "big boy":

https://huggingface.co/DavidAU/Openai_gpt-oss-120b-NEO-Imatrix-GGUF

QUANTS:

Due to quanting issues with this model (which result in oddball quant sizes / mixtures), only TESTED quants will be uploaded (at the moment).

Currently that means IQ4_NL, Q5_1, MXFP4_MOE4 (a special OpenAI Quant) and Q8_0 are available.

NEO dataset performance improvements will show the most in the IQ4_NL, followed by Q5_1.

I find Q5_1 quants work better (and more stable) for some use cases than IQ4_NL ; however IQ4_NLs can be wilder, and off the cuff more.

IQ4_NL quant(s):

  • OpenAI-20B-MAO-uncensored-NEO-IQ4_NL.gguf (Neo Imatrix)
  • OpenAI-20B-MAO-uncensored-NEOCODE-IQ4_NL.gguf (NeoCODE Imatrix)

Q5_1 quant(s):

  • OpenAI-20B-MAO-uncensored-NEO-Q5_1.gguf (Neo Imatrix)
  • OpenAI-20B-MAO-uncensored-NEOCODE-Q5_1.gguf (NeoCODE Imatrix)

MXFP4_MOE4 quant(s):

  • OpenAI-20B-UncensoredPlus-MAO-MXFP4_MOE4.gguf (output tensor at BF16, non imatrix -> has fixed tools functions)

Q8_0 quant(s):

  • pending.

NOTE: The output tensor makes up for 10-20% of the output.

IQ4_NL, Q5_1 and Q8_0 quants are compatible (less/minimal damage when quanting) with OpenAI's tensor structure.

MXFP4_MOE4 is an exact match to OpenAi's tensor structure, but has limited "imatrix" applied to it.

IMPORTANT: Using an "abliterated" model VS "uncensored" model

Usually when you a tell a model to generate horror, swear or x-rated content this is all you have to do to get said content type.

In the case of this model, it will not refuse your request, however it needs to be "pushed" a bit / directed a bit more in SOME CASES.

Although this model will generated x-rated content too, likewise you need to tell it to use "slang" (and include the terms you want) to get it generate the content correctly as the "expected" content level too.

Without these added directive(s), the content can be "bland" by comparison to an "uncensored model" or model trained on uncensored content.

Roughly, the model tries to generate the content but the "default" setting(s) are so "tame" it needs a push to generate at expected graphic, cursing or explicit levels.

Even with minimal direction (ie, use these words to swear: x,y,z), this will be enough to push the model to generate the requested content in the ahh... expected format.

ABLITERATED / UNCENSORED Notes / Settings:

  • Suggest experts set to 4 or 5 or 6.
  • 2-4 regens suggested.
  • Some regens will be strange, while others will be "bang on".
  • LOWER temps .4 to .8 ; especially if you get repeats/issues.
  • However, sometimes temp 1, 1.1, 1.2 are the best depending on your use case(s).
  • Temps of 2 or higher can be ah... very interesting.
  • LONGER prompts (with more details, directives) tend to work better as long as they are clear enough.
  • REP PEN setting is CRITICAL.

Suggested Settings (tested in Lmstudio, Beta Branch 0.3.21 ; 4 ):

  • Context: 8k min.
  • Temp 1 to 1.2+ for creative. Temp .6 (or so) for coding/general.
  • Rep pen 1.1, topk 40, topp .95, min p 0.05
  • Experts 4-8 depending on use case. (higher than 8 MAY lower quality AND/OR cause repeat issues)

Model Supports:

  • 128k context
  • up to 24 experts
  • Tools use, browsing, etc

For my help docs, SETTING NUMBER OF EXPERTS, and other see below.

See more about this model here:

https://huggingface.co/openai/gpt-oss-20b

[ Please refer to their model card, especially to control "thinking" levels. ]

AND the "light" uncensored version:

https://huggingface.co/huizimao/gpt-oss-20b-uncensored-mxfp4


Help, Adjustments, Samplers, Parameters and More


CHANGE THE NUMBER OF ACTIVE EXPERTS:

See this document:

https://huggingface.co/DavidAU/How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts

Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:

In "KoboldCpp" or "oobabooga/text-generation-webui" or "Silly Tavern" ;

Set the "Smoothing_factor" to 1.5

: in KoboldCpp -> Settings->Samplers->Advanced-> "Smooth_F"

: in text-generation-webui -> parameters -> lower right.

: In Silly Tavern this is called: "Smoothing"

NOTE: For "text-generation-webui"

-> if using GGUFs you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model)

Source versions (and config files) of my models are here:

https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be

OTHER OPTIONS:

  • Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use "smoothing_factor")

  • If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing") just make the adjustment as noted.

Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers

This a "Class 1" model:

For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]


EXAMPLE - IQ4_NL ; temp .8, using above settings (creative)

NO System prompt. (default thinking level)


PROMPT:

OUTPUT:

[[[thinking]]]

Downloads last month
4,320
GGUF
Model size
20.9B params
Architecture
gpt-oss
Hardware compatibility
Log In to view the estimation

4-bit

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for DavidAU/OpenAi-GPT-oss-20b-LIGHT-uncensored-NEO-Imatrix-gguf

Base model

openai/gpt-oss-20b
Quantized
(6)
this model

Collections including DavidAU/OpenAi-GPT-oss-20b-LIGHT-uncensored-NEO-Imatrix-gguf