This repository is an experimental re-quantized version of the original model
openai/gpt-oss-20b
.It requires development versions of
transformers
andbitsandbytes
.
Quantization
The MLP expert parameters have been dequantized from MXFP4 to BF16, and then requantized in the NF4 double-quantization format using an experimental bnb_4bit_target_parameters
configuration option. The self-attention, routing, and embedding parameters are kept in BF16.
- Downloads last month
- 52
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for mdouglas/gpt-oss-20b-bnb-nf4
Base model
openai/gpt-oss-20b