chitanda/deepseek-math.7b.ins.meta_math_cot.math55k.n5.critic_correct.dpo.H100.w4.v3.1.s42 Updated about 23 hours ago
chitanda/deepseek-math.7b.ins.meta_math_cot.math55k.n5.critic_correct.dpo.H100.w4.v3.1.s42 Updated about 23 hours ago
Magpie-Align/Magpie-Reasoning-V1-150K-CoT-Deepseek-R1-Llama-70B Viewer • Updated 18 days ago • 150k • 947 • 13
Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B Viewer • Updated 18 days ago • 250k • 3.6k • 71
Preference Optimization for Reasoning with Pseudo Feedback Paper • 2411.16345 • Published Nov 25, 2024 • 1
PFPO Collection Resources for the paper Preference Optimization for Reasoning with Pseudo Feedback (ICLR 2025) • 4 items • Updated 8 days ago • 1
Preference Optimization for Reasoning with Pseudo Feedback Paper • 2411.16345 • Published Nov 25, 2024 • 1