Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models Paper • 2505.16265 • Published 3 days ago • 6
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models Paper • 2505.16265 • Published 3 days ago • 6 • 2
ilgee/Llama-3.1-8B-Instruct-grpo-ep2-lr2e-6-kl1e-4-rollout512-0.03-generated Updated 15 days ago • 48
ilgee/Llama-3.1-8B-Instruct-grpo-ep2-lr2e-6-kl1e-4-rollout512-0.03-generated Updated 15 days ago • 48
ilgee/Llama-3.1-8B-Instruct-grpo-ep2-lr2e-6-kl1e-4-rollout512-0.03-groundtruth Updated 15 days ago • 37
ilgee/Llama-3.1-8B-Instruct-grpo-ep2-lr2e-6-kl1e-4-rollout512-0.03-groundtruth Updated 15 days ago • 37
ilgee/Llama-3.1-8B-Instruct-grpo-ep2-lr2e-6-kl1e-4-rollout512-0.03-multiclass Updated 16 days ago • 42
ilgee/hs2-naive-multiclass-min-ep5-lr5e-6-grpo-ep2-lr2e-6-kl1e-4-rollout512-half-v0 Updated 16 days ago • 1
ilgee/hs2-naive-multiclass-min-ep5-lr5e-6-grpo-ep2-lr2e-6-kl1e-4-rollout512-half-v0 Updated 16 days ago • 1