Qwen GRPO Fine Tuning
Collection
3 items
•
Updated
•
1
This model is a quantized version of the original model vinhnx90/vt-qwen-3b-GRPO-merged-16bit
.
It's quantized using the BitsAndBytes library to 4-bit using the bnb-my-repo space.
This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.
Base model
vinhnx90/vt-qwen-3b-GRPO-merged-16bit