Learning, Fast and Slow: Towards LLMs That Adapt Continually Paper • 2605.12484 • Published 4 days ago • 15
Chenlu123/shampoo_npg_tr_scale_delta20_lam1e-12_warmup_1_graftTrue_qwen2_5_math_1_5b Updated 22 days ago
Chenlu123/shampoo_npg_tr_scale_delta20_lam1e-12_warmup_1_graftTrue_qwen2_5_math_1_5b Updated 22 days ago
Chenlu123/grpo_warmup_graftTrue_qwen2_5_math_1_5b_guru_n16_bz64_mini_bz64_global_step_80 Updated Apr 8
Chenlu123/grpo_warmup_graftTrue_qwen2_5_math_1_5b_guru_n16_bz64_mini_bz64_global_step_80 Updated Apr 8
Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL Paper • 2603.19470 • Published Mar 19 • 3
Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL Paper • 2603.19470 • Published Mar 19 • 3
Chenlu123/teacher_Qwen3-4B_dapo-math-17k_n8_prompt_bsz_128_mini_bsz_32_step460 2B • Updated Mar 20 • 4
Chenlu123/teacher_Qwen3-4B_dapo-math-17k_n8_prompt_bsz_128_mini_bsz_32_step460 2B • Updated Mar 20 • 4
Chenlu123/teacher_Qwen3-4B_dapo-math-17k_n8_prompt_bsz_128_mini_bsz_32_step440 2B • Updated Mar 20 • 1