Unable to GGUF quant: Errors out.

by DavidAU - opened about 1 month ago

about 1 month ago

•

Convert to gguf ... (llamacpp)

line 8019, in generate_extra_tensors
raise ValueError("No MXFP4 tensors found in the model. Please make sure you are using MXFP4 model.")
ValueError: No MXFP4 tensors found in the model. Please make sure you are using MXFP4 model.

Tried older versions of LLamacpp too - > non recognized arch.

huihui-ai

Owner about 1 month ago

When I tried to convert bf16 to MXFP4, I got a prompt saying: "MXFP4 quantization is not serializable using safetensors for now."

Entropicengine

about 1 month ago

https://github.com/ggml-org/llama.cpp/pull/15111

Utochi

about 1 month ago

id LOVE to check out this model abliterated. hoping it will be usable afterwards

DavidAU

about 1 month ago

•

edited about 1 month ago

Also put an "issue" in at Llamacpp too ; looks like this will affect most (all?) Open Ai fine tunes.
the 15111 "pull" does not address this issue. (?)

VizorZ0042

about 1 month ago

@huihui-ai Will your version of GPT be as good as original without losing any coherence? If some coherence is lost, then in which parts it's most noticeable?

huihui-ai

Owner about 1 month ago

•

edited about 1 month ago

@VizorZ0042 Thank you for your support and feedback. This time, most of the weights in each layer were modified for gpt-oss-20b, so it's hard to predict the outcome. From testing with a few simple examples, the performance seems quite good.

ngxson

about 1 month ago

Can someone please try: https://github.com/ggml-org/llama.cpp/pull/15153

gabriellarson

about 1 month ago

Looks like it worked after ignoring an error

https://huggingface.co/gabriellarson/Huihui-gpt-oss-20b-BF16-abliterated-GGUF

nmkd

about 1 month ago

Looks like it worked after ignoring an error

https://huggingface.co/gabriellarson/Huihui-gpt-oss-20b-BF16-abliterated-GGUF

Nice! Are you doing smaller sizes as well or just F16?

gabriellarson

about 1 month ago

I tried to quantize to MXFP4 and got errors

Going down to Q4_1 worked but the output was very very bad

I'm gonna try again after making an imatrix

gabriellarson

about 1 month ago

making the imatrix resulted in a bunch of NaN, so no smaller sizes coming yet.

ngxson

30 days ago

I think MXFP4 is the smallest we can go. The FFN tensor shape is not divisible by Qx_K block size, so it cannot quantize to anything other than MXFP4 and Q8_0

DavidAU

30 days ago

•

edited 30 days ago

Patch added (#15153) to llamacpp an hour ago ; can now quant this model as well as other OpenAI finetunes.
I have tested; and will be uploading quants shortly.

Also: Successfully Imatrix'ed this model too.

Built IQ4_NL Quants in both "reg" and "imatrix" - working correctly.
Some "ablit" damage ; adjusting to address.

NOTE: Same issues with specific quants -> due to odd size tensors. => IQ4_NL, Q5_1, etc etc / Tensor fallbacks.

VizorZ0042

30 days ago

@DavidAU Thanks, another masterpiece from you.

According to your message the abliterated version does have some cons. Have you noticed more issues or still testing?

DavidAU

30 days ago

@VizorZ0042

There are some "ablit" issues ; but they are not too bad as long as the quants match up with OpenAi's tensor structure.
2 quants are up now:

https://huggingface.co/DavidAU/OpenAi-GPT-oss-20b-abliterated-uncensored-NEO-Imatrix-gguf

More coming as they are tested.

huihui-ai

Owner 30 days ago

Patch added (#15153) to llamacpp an hour ago ; can now quant this model as well as other OpenAI finetunes.
I have tested; and will be uploading quants shortly.

Also: Successfully Imatrix'ed this model too.

Built IQ4_NL Quants in both "reg" and "imatrix" - working correctly.
Some "ablit" damage ; adjusting to address.

NOTE: Same issues with specific quants -> due to odd size tensors. => IQ4_NL, Q5_1, etc etc / Tensor fallbacks.

https://github.com/ggml-org/llama.cpp/releases/tag/b6115 This version can be tested using llama-cli.

huihui-ai

Owner 30 days ago

The GGUF file has been uploaded. https://huggingface.co/huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated/tree/main/GGUF

FooFoo

29 days ago

Could you upload it to ollama? Pretty please ^_^

huihui-ai

Owner 29 days ago

Ollama likely needs modifications; it may be using an outdated version of llama.cpp. Replace it with the latest version and recompile.

ngxson

29 days ago

I don't think it's compatible with ollama, given their own implementation of gpt-oss (which is btw, less efficient than llama.cpp)

FooFoo

29 days ago

I don't think it's compatible with ollama, given their own implementation of gpt-oss (which is btw, less efficient than llama.cpp)

Thanks. I'll take a look at the llama-server to host it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment