gpt-oss-120b-uncensored-bf16

#1262
by jacek2024 - opened

I really don't like that the author named it uncensored when I can easily tell just from the model card that it is clearly not. All he did was 800 rows of Amazon FalseReject which is 14.6K row a dataset meant to reduce false censorship. This is not an uncesnsored finetune but a finetune to reduce the model’s tendency to overcensor. Even at that I can't imagine it doing particularly well given that he didn't even train across the entire relatively small dataset.

That problem is that if we quantize this that he will be the one that gets the probable very desired gpt-oss-120b-uncensored-bf16 name and we confuse our users thinking there already is an uncensored model of gpt-oss-120b when there is not. Maybe you could ask huizimao to rename the model to something that better represents what it is. Alternatively we could clone and rename it ourselves. If you really feel like this model deserves this name we can also go ahead and quantize it.

I just queued it (going through the daily list a bit late), but now removed it from the queue. I understand your (nico's) reasoning, and sure, you are right, but generally, I strongly prefer first come first serve, mainly because I don't want mradermacher to be a sign of honor for the original model, it should just be the fallback resource for quants, regardless of how great or crappy the model is. That is, mradermacher shouldn't be the place to look for an uncensored gpt-oss-120b in the first place. Clearly not everybody sees that... Well, we need to stay flexible.

But yes, we'll happily queue it when you (jacek) think it is really worthy, but best would probably be to ask huizimao to consider renaming the model to give a clearer name -that would, I think, be best, as the name is clearly an issue for folks.

I now queued it as there is high demand for this model and bartowski's quants are somewhat dumb due to him using Q8 for FFN and so none of his quants really being small enough for many users to run: https://huggingface.co/bartowski/huizimao_gpt-oss-120b-uncensored-bf16-GGUF/discussions/1

Like the insanity of his Q2_K quants: https://huggingface.co/bartowski/huizimao_gpt-oss-120b-uncensored-bf16-GGUF/tree/main/huizimao_gpt-oss-120b-uncensored-bf16-IQ2_M - they are 62.7 making them not even fit into 64 GiB of RAM without offloading to GPU. We can do better than this.

bartowski's quants already have over 10K downloads and users don't seem to mind that it is just not really uncensored and seem to be happy with the model simply no longer being overly censored. At least nobody seemed to have cared enough to complain about it. But maybe it’s also simply because almost nobody can run them. In any case in the future let’s do user requests even for models clearly labeled in a misleading/overpromising way so users can have their own opinion about it.

The reason all sizes are the same is because of this:

https://github.com/ggml-org/llama.cpp/pull/15091#issuecomment-3155962803

I legit shouldn't have bothered with any other sizes, I didn't even think about it when I clicked the buttons, but figured at this point just leave them up so people can see instead of asking "where Q2_K?"

If this changes in the future, I'd obviously happily quantize them to other sizes, but at this time it seems that using anything else for the FFN is a bad idea and will break things fundamentally and probably not worth providing

(as for whether the model is worth making at all, that's an entirely different discussion, I went based purely on someone I trusted asking for it and did not vet it myself, you probably were right to look into it and find that it's not necessarily good)

(it's also why I found the need to start adding the author name to the model card, because you're right, they now have the gpt-oss-120b-uncensored-bf16 name and when one comes out that genuinely is uncensored people who don't put the author in the model name will struggle, and putting the author name in the model name is ugly and i hate that i have to do it)

We had discussion about naming here already with huihui :)

The reason all sizes are the same is because of this:
https://github.com/ggml-org/llama.cpp/pull/15091#issuecomment-3155962803

Thanks a lot for pointing that out. I completely missed that this is something forced on us from llama.cpp and not something caused by your mix. I can confirm that we are experiencing the same issue using the default mix and simply haven't noticed before. https://huggingface.co/mradermacher/gpt-oss-120b-i1-GGUF shows the same exact stupid behavior of basically all quants below Q4 being useless. i1-Q2_K_S is even larger than i1-IQ4_XS. This is so messed up and quite sad as that means normal users will simply not be able to run any gpt-oss-120b based model.

@mradermacher Can you please configure so all quants smaller than i1-IQ4_XS are skipped for any future GptOssForCausalLM based models?

as for whether the model is worth making at all, that's an entirely different discussion, I went based purely on someone I trusted asking for it and did not vet it myself, you probably were right to look into it and find that it's not necessarily good

You did the right choice by providing quants. Over 10K users downloaded and didn't complain about it so I assume they enjoy the model. I regret we have not provided them earlier. Not everyone needs a fully uncensored model and making it slightly less censored might have been all they desired. I still really dislike authors not being honest when naming their models.

it's also why I found the need to start adding the author name to the model card, because you're right, they now have the gpt-oss-120b-uncensored-bf16 name and when one comes out that genuinely is uncensored people who don't put the author in the model name will struggle, and putting the author name in the model name is ugly and i hate that i have to do it

Simpler model names are more important than uniqueness for us. Naming conflicts are extremely rare and if they happen, we usually find a solution. In the end it really all is just personal preference and booth naming conventions have their advantages/disadvantages.

We had discussion about naming here already with huihui :)

Which worked out amazingly as they now use unique names for their models.

I think (don't quote me on this) it's not strictly llama.cpp forcing, so much as gpt-oss just fundamentally works poorly with other quant types, but I would need someone more educated in the scene to clarify that.. But I think it's less llama.cpp and more openai's release format that causes issues.

Simpler model names are more important than uniqueness for us. Naming conflicts are extremely rare and if they happen, we usually find a solution. In the end it really all is just personal preference and booth naming conventions have their advantages/disadvantages.

yeah I hummed and hawed about it, at the end I let people vote and went with the vocal minorities opinion, I haven't received any negative feedback since so that's good? I still don't like it, I agree simpler names are waaay better, but I also do like the idea of clearly marking who released the original model (like for example, kalomaze's kalomaze/Qwen3-16B-A3B, if I just released it as Qwen3-16B-A3B, that looks a LOT like a legit Qwen release, when in reality it's an experiment kalomaze put up) - plus I'm lazy and want to do the absolute minimum intervention, like renaming models

the REAL solution would be some better UI from huggingface, where the model could be named one thing, but very clearly visibly show where it originated (I'm super thankful for the model trees for that, but the amount of people who still don't know that exists is crazy), but I don't even know what that would look like ideally 🤷‍♂️

Sign up or log in to comment