Thanks for taking a crack at this!
Hi,
I was wondering if anyone would attempt this, heh. Glad to see that you did. It was beyond me, lol. Anyway, I wondering if you'd seen:
dnotitia/Smoothie-Qwen3-235B-A22B
"Smoothie Qwen is a post-processing tool designed to subtly refine the token distribution in Qwen3 & Qwen2.5 models. By analyzing and adjusting token weights particularly those associated with specific Unicode ranges it helps mitigate unintended biases toward certain languages while preserving the model’s core capabilities.
This approach is especially useful for applications requiring balanced multilingual outputs, where overrepresentation of one language might skew results. The tool identifies target tokens through Unicode ranges, including subword tokenization (e.g., partial characters from BPE tokenization), and applies probabilistic smoothing to encourage diversity.
Key Features
Token identification based on Unicode ranges of the target language
Detection of broken or malformed tokens (e.g. �) caused by subword tokenization
Identification of token combinations that may probabilistically form the target language
Flexible analysis strategies (e.g., N-gram analysis) to detect high-risk token patterns
Configurable analysis methods with future support for additional techniques
Adjustment of token weights in the lm_head layer to reduce generation likelihood
Saving of modified models for reuse or deployment
Automation of model generation across parameter variations (min_scale, smoothness)"
I find the "smoothed" model just produces better English-language output overall.
Someone has already done the work on this (note the example in the image, lol).
Note that these are NOT control vectors like in llama.cpp. Those have to be trained.
Style Vectors for Steering Generative Large Language Models (Konen et al., Findings 2024)
Also, if you're training, you might be interested in this. It requires patching transformers, tho, but I still think its interesting.