This repository is an experimental re-quantized version of the original model openai/gpt-oss-20b.

It requires development versions of transformers and bitsandbytes.

Quantization

The MLP expert parameters have been dequantized from MXFP4 to BF16, and then requantized in the NF4 double-quantization format using an experimental bnb_4bit_target_parameters configuration option. The self-attention, routing, and embedding parameters are kept in BF16.

Downloads last month: 52

Safetensors

Model size

11.4B params

Tensor type

F32

BF16

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mdouglas/gpt-oss-20b-bnb-nf4

Base model

openai/gpt-oss-20b

Quantized

(60)

this model