Unable to GGUF quant: Errors out.
Convert to gguf ... (llamacpp)
line 8019, in generate_extra_tensors
raise ValueError("No MXFP4 tensors found in the model. Please make sure you are using MXFP4 model.")
ValueError: No MXFP4 tensors found in the model. Please make sure you are using MXFP4 model.
Tried older versions of LLamacpp too - > non recognized arch.
When I tried to convert bf16 to MXFP4, I got a prompt saying: "MXFP4 quantization is not serializable using safetensors for now."
id LOVE to check out this model abliterated. hoping it will be usable afterwards
Also put an "issue" in at Llamacpp too ; looks like this will affect most (all?) Open Ai fine tunes.
the 15111 "pull" does not address this issue. (?)
@huihui-ai Will your version of GPT be as good as original without losing any coherence? If some coherence is lost, then in which parts it's most noticeable?
@VizorZ0042 Thank you for your support and feedback. This time, most of the weights in each layer were modified for gpt-oss-20b, so it's hard to predict the outcome. From testing with a few simple examples, the performance seems quite good.
Looks like it worked after ignoring an error
https://huggingface.co/gabriellarson/Huihui-gpt-oss-20b-BF16-abliterated-GGUF
Looks like it worked after ignoring an error
https://huggingface.co/gabriellarson/Huihui-gpt-oss-20b-BF16-abliterated-GGUF
Nice! Are you doing smaller sizes as well or just F16?
I tried to quantize to MXFP4 and got errors
Going down to Q4_1 worked but the output was very very bad
I'm gonna try again after making an imatrix
making the imatrix resulted in a bunch of NaN, so no smaller sizes coming yet.
I think MXFP4 is the smallest we can go. The FFN tensor shape is not divisible by Qx_K block size, so it cannot quantize to anything other than MXFP4 and Q8_0
Patch added (#15153) to llamacpp an hour ago ; can now quant this model as well as other OpenAI finetunes.
I have tested; and will be uploading quants shortly.
Also: Successfully Imatrix'ed this model too.
Built IQ4_NL Quants in both "reg" and "imatrix" - working correctly.
Some "ablit" damage ; adjusting to address.
NOTE: Same issues with specific quants -> due to odd size tensors. => IQ4_NL, Q5_1, etc etc / Tensor fallbacks.
@DavidAU Thanks, another masterpiece from you.
According to your message the abliterated version does have some cons. Have you noticed more issues or still testing?
There are some "ablit" issues ; but they are not too bad as long as the quants match up with OpenAi's tensor structure.
2 quants are up now:
https://huggingface.co/DavidAU/OpenAi-GPT-oss-20b-abliterated-uncensored-NEO-Imatrix-gguf
More coming as they are tested.
Patch added (#15153) to llamacpp an hour ago ; can now quant this model as well as other OpenAI finetunes.
I have tested; and will be uploading quants shortly.Also: Successfully Imatrix'ed this model too.
Built IQ4_NL Quants in both "reg" and "imatrix" - working correctly.
Some "ablit" damage ; adjusting to address.NOTE: Same issues with specific quants -> due to odd size tensors. => IQ4_NL, Q5_1, etc etc / Tensor fallbacks.
https://github.com/ggml-org/llama.cpp/releases/tag/b6115 This version can be tested using llama-cli.
The GGUF file has been uploaded. https://huggingface.co/huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated/tree/main/GGUF
Could you upload it to ollama? Pretty please ^_^
Ollama likely needs modifications; it may be using an outdated version of llama.cpp. Replace it with the latest version and recompile.
I don't think it's compatible with ollama, given their own implementation of gpt-oss (which is btw, less efficient than llama.cpp)
I don't think it's compatible with ollama, given their own implementation of gpt-oss (which is btw, less efficient than llama.cpp)
Thanks. I'll take a look at the llama-server to host it.