Models in Adaptive Length Penalty Paper
AI & ML interests
None defined yet.
Recent Activity
View all activity
models
15
RLAIF/dpo_thinking_base_openorca_0.02_1.7B-4B
Updated
RLAIF/grpo_thinking_ultrafeedback-original_32_64_4_3e-3_2e-7_step-120_1.7B
2B
•
Updated
•
8
RLAIF/grpo_step270_1.7B
2B
•
Updated
•
8
RLAIF/grpo_step30_1.7B
2B
•
Updated
•
8
RLAIF/grpo_5e-7_4_1.7B-best
2B
•
Updated
•
8
RLAIF/Qwen3-1.7B_grpo_lr2e-7_n4_step30
2B
•
Updated
•
11
RLAIF/reward-model-grpo
0.8B
•
Updated
•
9
RLAIF/llama-3b-open-r1-50k-sft
4B
•
Updated
•
7
RLAIF/sft-external
Text Generation
•
8B
•
Updated
RLAIF/sft-llama-3.1-8b-external
Text Generation
•
8B
•
Updated
datasets
80
RLAIF/dpo_uf_rejudged_mixed_openorca_with_gold_labels_kl_estimation
Viewer
•
Updated
•
65.6k
•
11
RLAIF/dpo_uf_rejudged_mixed_openorca_kl_estimation
Viewer
•
Updated
•
65.6k
•
10
RLAIF/dpo_uf_rejudged_mixed_openorca_kl_est
Viewer
•
Updated
•
65.6k
•
10
RLAIF/dpo_answer_offtheshelf_openorca_1e-6_0.02_0.6B_0.6B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
49.4k
•
15
RLAIF/dpo_answer_ultrafeedback_filtered_openorca_1e-6_0.02_0.6B_0.6B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
49.4k
•
19
RLAIF/dpo_answer_ultrafeedback_openorca_1e-6_0.02_0.6B_0.6B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
49.4k
•
18
RLAIF/dpo_thinking_base_openorca_0.02_1.7B-4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
152k
•
18
RLAIF/dpo_thinking_ultrafeedback_rejudged_openorca_0.02_with_gold_labels_kl_estimation
Viewer
•
Updated
•
152k
•
26
RLAIF/dpo_answer_ultrafeedback_rejudged_openorca_0.02_with_gold_labels_kl_estimation
Viewer
•
Updated
•
152k
•
32
RLAIF/dpo_answer_base_openorca_0.02_with_gold_labels_kl_estimation
Viewer
•
Updated
•
150k
•
35