This repository is an experimental re-quantized version of the original model openai/gpt-oss-20b.

It requires development versions of transformers and bitsandbytes.

Quantization

The MLP expert parameters have been dequantized from MXFP4 to BF16, and then requantized in the NF4 double-quantization format using an experimental bnb_4bit_target_parameters configuration option. The self-attention, routing, and embedding parameters are kept in BF16.

Downloads last month
52
Safetensors
Model size
11.4B params
Tensor type
F32
·
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mdouglas/gpt-oss-20b-bnb-nf4

Base model

openai/gpt-oss-20b
Quantized
(60)
this model