Qwen2.5 7b GRPO RM Train (Writing Demo)

This is a base model that has had an experimental reward model RL training done over it for a subset of the Erebus dataset (creative writing).

Model Output Example (from 768 token prefix)

Reward function files can be found here: verifiers

This model was trained using my chunked pref reward model baseline: pretrain-rm-baseline-7b

Safetensors

Model size

7.62B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Qwen/Qwen2.5-7B

Finetuned

(606)

this model

Quantizations