9 5 3

Gaotang Li

gaotang

https://gaotangli.github.io/

GaotangLi

AI & ML interests

None yet

Recent Activity

liked a dataset 6 days ago

xinlai/Math-Step-DPO-10K

upvoted a paper 6 days ago

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

updated a model 6 days ago

gaotang/RM-R1-Qwen2.5-Instruct-14B

View all activity

Organizations

None yet

Collections 2

Papers 2

arxiv:2505.02387

arxiv:2503.10996

models 10

datasets 28

gaotang/RM-R1-Entire-RLVR-Train

Viewer • Updated 6 days ago • 73k • 115 • 1

gaotang/RM-R1-after-Distill-RLVR

Viewer • Updated 6 days ago • 64.2k • 89 • 1

gaotang/RM-R1-Distill-SFT

Viewer • Updated 6 days ago • 8.75k • 96 • 1

gaotang/RM-R1-Reasoning-RLVR

Viewer • Updated 8 days ago • 73k • 73

gaotang/ParaConfilct

Viewer • Updated 12 days ago • 3 • 21

gaotang/filtered_sky_code_8k_math_10k_rubric_evidence_classify_weight_rest_0417

Viewer • Updated 27 days ago • 64.2k • 74

gaotang/filtered_sky_code_8k_math_10k_rubric_evidence_classify_weight

Viewer • Updated 27 days ago • 73k • 86

gaotang/filtered_sky_code_8k_math_10k_rubric_reasoning

Viewer • Updated about 1 month ago • 73k • 100

gaotang/filtered_sky_code_8k_math_10k_rubric_sft

Viewer • Updated Apr 11 • 73k • 26

gaotang/filtered_sky_code_8k_math_10k_rubric_evidence_classify

Viewer • Updated Apr 11 • 73k • 62

Gaotang Li

AI & ML interests

Recent Activity

Organizations

Collections 2

gaotang/ParaConfilct

Taming Knowledge Conflicts in Language Models

RM-R1: Reward Modeling as Reasoning

gaotang/RM-R1-Distill-SFT

gaotang/RM-R1-after-Distill-RLVR

Papers 2

models 10

gaotang/RM-R1-Qwen2.5-Instruct-14B

gaotang/RM-R1-DeepSeek-Distilled-Qwen-14B

gaotang/RM-R1-DeepSeek-Distilled-Qwen-32B

gaotang/RM-R1-Qwen2.5-Instruct-7B

gaotang/RM-R1-Qwen2.5-Instruct-32B

gaotang/RM-R1-DeepSeek-Distilled-Qwen-7B

gaotang/qwen_7b_sky_filtered_code8k_math_10k_distilled_Claude_o3_0419

gaotang/qwen_7b_sky_filtered_code8k_math_10k_distilled_OpenAI

gaotang/qwen_14b_sky_filtered_code8k_math_10k_distilled_OpenAI

gaotang/qwen2.5_14B_LR1.0e-6_evidence_rubric_4k2k_separate_reward_function

datasets 28

gaotang/RM-R1-Entire-RLVR-Train

gaotang/RM-R1-after-Distill-RLVR

gaotang/RM-R1-Distill-SFT

gaotang/RM-R1-Reasoning-RLVR

gaotang/ParaConfilct

gaotang/filtered_sky_code_8k_math_10k_rubric_evidence_classify_weight_rest_0417

gaotang/filtered_sky_code_8k_math_10k_rubric_evidence_classify_weight

gaotang/filtered_sky_code_8k_math_10k_rubric_reasoning

gaotang/filtered_sky_code_8k_math_10k_rubric_sft

gaotang/filtered_sky_code_8k_math_10k_rubric_evidence_classify

Gaotang Li

AI & ML interests

Recent Activity

Organizations

Collections 2

Papers 2

models 10 Sort: Recently updated

datasets 28 Sort: Recently updated

models 10

datasets 28