Qwen2.5 7b GRPO RM Train (Writing Demo)
This is a base model that has had an experimental reward model RL training done over it for a subset of the Erebus dataset (creative writing).
Model Output Example (from 768 token prefix)
Other
Reward function files can be found here: verifiers
This model was trained using my chunked pref reward model baseline: pretrain-rm-baseline-7b
- Downloads last month
- 19
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.