This model was generated using DWQ quantization to bring the quality of the 4bit quantization closer to 8bit without increasing in size. This was done using mlx-lm version 0.26.3, using --bits 4 --learning-rate 1e-7 --batch-size 1 --group-size 16
.
- Downloads last month
- 59
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support