Upload folder using huggingface_hub
Browse files
README.md
CHANGED
@@ -64,7 +64,7 @@ We evaluate our model on two challenging reward benchmarks, [RM-Bench](https://g
|
|
64 |
|**LLM-as-a-Judge**||||||
|
65 |
|GPT-4o |- |50.6 | 54.1 | 75.0 | 59.5 | 59.8 |
|
66 |
|Claude-3.5-Sonnet|- |62.3 | 66.3 | 66.1 | 64.3 | 64.8|
|
67 |
-
|DeepSeek-R1-0528 |671B|59.1 | 82.7 | 80.4 | 92.9 | 78.8|
|
68 |
|**Open-Source Reward Models**||||||
|
69 |
|Llama-3.1-Nemotron-70B-Reward | 70B | 62.3 | 72.5 | 76.8 | 57.1 | 67.2|
|
70 |
|Skywork-Reward-Gemma-2-27B | 27B | 59.7 | 66.3 | 83.9 | 50.0 | 65.0|
|
@@ -73,7 +73,7 @@ We evaluate our model on two challenging reward benchmarks, [RM-Bench](https://g
|
|
73 |
|Nemotron-Super-Multilingual | 49B | 64.9 | 74.5 | 87.5 | 73.8 | 75.2|
|
74 |
|**Reasoning Reward Models**||||||
|
75 |
|RM-R1-Distilled-Qwen-32B | 32B | 76.0 | 80.6 | 88.1 | 70.5 | 78.8 |
|
76 |
-
|RM-R1-Distilled-Qwen-14B | 14B | 68.1 | 72.4 | 87.8 |
|
77 |
|RRM-32B | 32B | 79.9 | 70.4 | 87.5 | 65.0 | 75.7 |
|
78 |
|**Training with Unlabeled Preference Data**||||||
|
79 |
|GRAM-Qwen3-14B | 14B | 63.0 | 64.3 | **89.3** | 69.1 | 71.4 |
|
|
|
64 |
|**LLM-as-a-Judge**||||||
|
65 |
|GPT-4o |- |50.6 | 54.1 | 75.0 | 59.5 | 59.8 |
|
66 |
|Claude-3.5-Sonnet|- |62.3 | 66.3 | 66.1 | 64.3 | 64.8|
|
67 |
+
|DeepSeek-R1-0528 |671B|59.1 | 82.7 | 80.4 | **92.9** | 78.8|
|
68 |
|**Open-Source Reward Models**||||||
|
69 |
|Llama-3.1-Nemotron-70B-Reward | 70B | 62.3 | 72.5 | 76.8 | 57.1 | 67.2|
|
70 |
|Skywork-Reward-Gemma-2-27B | 27B | 59.7 | 66.3 | 83.9 | 50.0 | 65.0|
|
|
|
73 |
|Nemotron-Super-Multilingual | 49B | 64.9 | 74.5 | 87.5 | 73.8 | 75.2|
|
74 |
|**Reasoning Reward Models**||||||
|
75 |
|RM-R1-Distilled-Qwen-32B | 32B | 76.0 | 80.6 | 88.1 | 70.5 | 78.8 |
|
76 |
+
|RM-R1-Distilled-Qwen-14B | 14B | 68.1 | 72.4 | 87.8 | 84.2 | 78.1 |
|
77 |
|RRM-32B | 32B | 79.9 | 70.4 | 87.5 | 65.0 | 75.7 |
|
78 |
|**Training with Unlabeled Preference Data**||||||
|
79 |
|GRAM-Qwen3-14B | 14B | 63.0 | 64.3 | **89.3** | 69.1 | 71.4 |
|