Important confirmation: about the deviation from the model that was originally uploaded
Thank you for your special models.
However, When we tried it yesterday, it felt very capable and lightweight, but after downloading the model again this morning, it can no longer give meaningful answers to most questions.
Inference also feels much slower.
Upon checking, I found that since yesterday a new quantization method, “bitnet,” has been added in the config file, and every assistant response now comes out as “the, the, the.”
Was something changed? Or were the wrong files uploaded when the model was released yesterday?
Does that mean that, as of yesterday, the model was either not quantized or was using an earlier (non‑latest) BitNet definition?
Best, Axcxept Inc.
I am getting this error trying to launch with vllm, not not specifying quants, or specifying one from the list, same error.
vllm | ValueError: Unknown quantization method: bitnet. Must be one of ['aqlm', 'awq', 'deepspeedfp', 'tpu_int8', 'fp8', 'ptpc_fp8', 'fbgemm_fp8', 'modelopt', 'nvfp4', 'marlin', 'gguf', 'gptq_marlin_24', 'gptq_marlin', 'awq_marlin', 'gptq', 'compressed-tensors', 'bitsandbytes', 'qqq', 'hqq', 'experts_int8', 'neuron_quant', 'ipex', 'quark', 'moe_wna16'].
vllm exited with code 0
Interesting you're getting the, the, the. I downloaded, built and installed it exactly to instructions on the github and all I get are "ggggggg" repeating. Also if I exit bitnet.cpp something happens and my terminal becomes non-responsive requiring a reboot.
Something is definitely wrong here.
When the model was first released, the SFTTrainer worked without issues.
This indicates that the model was not quantized at that time. As a result, we were able to train it without specifying QLoRA during training.
However, now, when we run the same code, we get an error stating that quantized models cannot be trained.
This likely means that the model has now been properly quantized due to a configuration change.
That said, inference clearly exhibits signs of catastrophic forgetting and repetition loops.
Additionally, there is a noticeable delay both before and during inference, which significantly deviates from what was reported.
We are currently verifying the truth behind these changes.
Thank you.
Hello,
Thank you for bringing these issues to our attention.
We have recently updated the bitnet branch in our transformers repository fork (https://github.com/shumingma/transformers/tree/bitnet). This update occurred after the initial release, primarily to align with Hugging Face integration requirements.
It's possible these updates address the issues you've reported. Please ensure your local repository is synchronized with the latest version of this branch and test again. Let us know if the problems continue after updating.
Thanks.