wangclnlp commited on
Commit
a9a7dc9
·
verified ·
1 Parent(s): 5be04c0

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -64,7 +64,7 @@ We evaluate our model on two challenging reward benchmarks, [RM-Bench](https://g
64
  |**LLM-as-a-Judge**||||||
65
  |GPT-4o |- |50.6 | 54.1 | 75.0 | 59.5 | 59.8 |
66
  |Claude-3.5-Sonnet|- |62.3 | 66.3 | 66.1 | 64.3 | 64.8|
67
- |DeepSeek-R1-0528 |671B|59.1 | 82.7 | 80.4 | 92.9 | 78.8|
68
  |**Open-Source Reward Models**||||||
69
  |Llama-3.1-Nemotron-70B-Reward | 70B | 62.3 | 72.5 | 76.8 | 57.1 | 67.2|
70
  |Skywork-Reward-Gemma-2-27B | 27B | 59.7 | 66.3 | 83.9 | 50.0 | 65.0|
@@ -73,7 +73,7 @@ We evaluate our model on two challenging reward benchmarks, [RM-Bench](https://g
73
  |Nemotron-Super-Multilingual | 49B | 64.9 | 74.5 | 87.5 | 73.8 | 75.2|
74
  |**Reasoning Reward Models**||||||
75
  |RM-R1-Distilled-Qwen-32B | 32B | 76.0 | 80.6 | 88.1 | 70.5 | 78.8 |
76
- |RM-R1-Distilled-Qwen-14B | 14B | 68.1 | 72.4 | 87.8 | **84.2** | 78.1 |
77
  |RRM-32B | 32B | 79.9 | 70.4 | 87.5 | 65.0 | 75.7 |
78
  |**Training with Unlabeled Preference Data**||||||
79
  |GRAM-Qwen3-14B | 14B | 63.0 | 64.3 | **89.3** | 69.1 | 71.4 |
 
64
  |**LLM-as-a-Judge**||||||
65
  |GPT-4o |- |50.6 | 54.1 | 75.0 | 59.5 | 59.8 |
66
  |Claude-3.5-Sonnet|- |62.3 | 66.3 | 66.1 | 64.3 | 64.8|
67
+ |DeepSeek-R1-0528 |671B|59.1 | 82.7 | 80.4 | **92.9** | 78.8|
68
  |**Open-Source Reward Models**||||||
69
  |Llama-3.1-Nemotron-70B-Reward | 70B | 62.3 | 72.5 | 76.8 | 57.1 | 67.2|
70
  |Skywork-Reward-Gemma-2-27B | 27B | 59.7 | 66.3 | 83.9 | 50.0 | 65.0|
 
73
  |Nemotron-Super-Multilingual | 49B | 64.9 | 74.5 | 87.5 | 73.8 | 75.2|
74
  |**Reasoning Reward Models**||||||
75
  |RM-R1-Distilled-Qwen-32B | 32B | 76.0 | 80.6 | 88.1 | 70.5 | 78.8 |
76
+ |RM-R1-Distilled-Qwen-14B | 14B | 68.1 | 72.4 | 87.8 | 84.2 | 78.1 |
77
  |RRM-32B | 32B | 79.9 | 70.4 | 87.5 | 65.0 | 75.7 |
78
  |**Training with Unlabeled Preference Data**||||||
79
  |GRAM-Qwen3-14B | 14B | 63.0 | 64.3 | **89.3** | 69.1 | 71.4 |