RLAIF/dpo_answer_openorca_base_nathan_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated
• 65.3k • 3
RLAIF/dpo_answer_openorca_angel_base_nathan_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated
• 45.9k • 3
RLAIF/dpo_answer_openorca_angel_nathan_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated
• 65.3k • 3
RLAIF/dpo_answer_openorca_angel_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated
• 42.4k • 5
RLAIF/dpo_answer_openorca_openorca_argilla_rejudged_filtered_1e-6_0.02_1.7B_4B_with_gold_labels_kl_est
Viewer
• Updated
• 44.1k • 3
RLAIF/dpo_answer_openorca_skywork_rejudged_filtered_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated
• 38.8k • 3
RLAIF/dpo_answer_openorca_baseline_mix_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated
• 65.3k • 3
RLAIF/dpo_answer_openorca_openorca_argilla_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated
• 60k • 3
RLAIF/dpo_answer_openorca_skywork_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated
• 51.2k • 3
RLAIF/dpo_answer_openorca_helpsteer3_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated
• 40.6k • 2
RLAIF/dpo_answer_openorca_ppe_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated
• 37.1k • 5
RLAIF/dpo_thinking_openorca_offtheshelf_improved_1e-6_0.02_1.7B_0.6B_with_gold_labels_kl_estimation
Viewer
• Updated
• 141k • 3
RLAIF/dpo_answer_openorca_ultrafeedback_s3_lr1e6_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_est
Viewer
• Updated
• 65.3k • 2
RLAIF/dpo_answer_openorca_ultrafeedback_s100_lr1e6_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_est
Viewer
• Updated
• 65.3k • 3
RLAIF/dpo_answer_openorca_ultrafeedback_s336_lr1e5_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_est
Viewer
• Updated
• 58.2k • 3
RLAIF/dpo_thinking_openorca_offtheshelf_improved_1e-6_0.02_1.7B_8B_with_gold_labels_kl_estimation
Viewer
• Updated
• 152k • 3
RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_14B_with_gold_labels_kl_estimation
Viewer
• Updated
• 152k • 3
RLAIF/dpo_thinking_openorca_offtheshelf_improved_1e-6_0.02_1.7B_1.7B_with_gold_labels_kl_estimation
Viewer
• Updated
• 152k • 3
RLAIF/dpo_thinking_openorca_offtheshelf_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated
• 152k • 3
RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_8B_with_gold_labels_kl_estimation
Viewer
• Updated
• 152k • 3
RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
• Updated
• 152k • 3
RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_1.7B_with_gold_labels_kl_estimation
Viewer
• Updated
• 152k • 3
RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_0.6B_with_gold_labels_kl_estimation
Viewer
• Updated
• 152k • 2
RLAIF/dpo_uf_rejudged_mixed_openorca_with_gold_labels_kl_estimation
Viewer
• Updated
• 152k • 3
RLAIF/dpo_answer_2e-6_openorca_prompts_responses_1e-6_0.02_0.6B_0.6B_with_gold_labels_kl_estimation
Viewer
• Updated
• 86.5k • 5
RLAIF/dpo_uf_rejudged_mixed_openorca_kl_estimation
Viewer
• Updated
• 65.6k • 2
RLAIF/dpo_uf_rejudged_mixed_openorca_kl_est
Viewer
• Updated
• 65.6k • 3
RLAIF/dpo_answer_offtheshelf_openorca_1e-6_0.02_0.6B_0.6B_with_gold_labels_kl_estimation
Viewer
• Updated
• 49.4k • 3
RLAIF/dpo_answer_ultrafeedback_filtered_openorca_1e-6_0.02_0.6B_0.6B_with_gold_labels_kl_estimation
Viewer
• Updated
• 49.4k • 2
RLAIF/dpo_answer_ultrafeedback_openorca_1e-6_0.02_0.6B_0.6B_with_gold_labels_kl_estimation
Viewer
• Updated
• 49.4k • 2