Abliteration not working

#1
by SubtleOne - opened

It is still heavily self-censoring.

It's not completely non-working, the acceptance level for some content is much higher, but it seems to be limited to accepting my instruction(generate text), and in its turn, it will generate some text but adjust the topic to the 'correct' direction. Just as if it had not seen those harmful words....

Ah sorry it's still W.I.P. I'm trying different approaches so I don't lobotomize the model either.

Thanks for the effort Maxime, it's more than needed!

If you ever need a crowfunding for the costs for the 32B variant, I'm in!
I'm saying that cause it felt to me the 32B feels suffered even more.

Edit, I was meaning the 32B, sorry for confusion

@mlabonne

Just a heads up, might help:

I found the number of experts activated needs adjustment (up... WAY UP... 24/32/64) when Imatrixing this model.
This might be a factor too when abliterating it. (?)

Thanks @DavidAU ! There's a big gap between <=8B and >=14B Qwen3 models for abliteration. I still need to run a few more experiments to crack this one but pretty happy with the other sizes.

Excellent ; looking forward to it.
Team Qwen really did a great job... except the "censoring" -;

Sign up or log in to comment