wangclnlp
/

GRAM-RR-LLaMA-3.2-3B-RewardModel

Text Generation

RewardReasoning

Model card Files Files and versions

wangclnlp commited on 17 days ago

Commit

a9a7dc9

·

verified ·

1 Parent(s): 5be04c0

Upload folder using huggingface_hub

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -64,7 +64,7 @@ We evaluate our model on two challenging reward benchmarks, [RM-Bench](https://g
   |**LLM-as-a-Judge**||||||
   |GPT-4o           |-   |50.6 |  54.1 |  75.0 |  59.5 | 59.8 |
   |Claude-3.5-Sonnet|-   |62.3 |  66.3 |  66.1 |  64.3 |  64.8|
-  |DeepSeek-R1-0528 |671B|59.1 |  82.7 |  80.4 |  92.9 |  78.8|
   |**Open-Source Reward Models**||||||
   |Llama-3.1-Nemotron-70B-Reward | 70B | 62.3 |  72.5 |  76.8 |  57.1 |  67.2|
   |Skywork-Reward-Gemma-2-27B | 27B | 59.7 |  66.3 |  83.9 |  50.0 |  65.0|
@@ -73,7 +73,7 @@ We evaluate our model on two challenging reward benchmarks, [RM-Bench](https://g
   |Nemotron-Super-Multilingual | 49B | 64.9 |  74.5 |  87.5 |  73.8 |  75.2|
   |**Reasoning Reward Models**||||||
   |RM-R1-Distilled-Qwen-32B  | 32B   | 76.0 |  80.6 |  88.1 |  70.5 |  78.8 |
-  |RM-R1-Distilled-Qwen-14B  | 14B   |  68.1  |  72.4  |  87.8  |  **84.2**  |  78.1 |
   |RRM-32B      | 32B    |  79.9  |  70.4  |  87.5  |  65.0  |  75.7 |
   |**Training with Unlabeled Preference Data**||||||
   |GRAM-Qwen3-14B    | 14B  | 63.0 |  64.3 |  **89.3** |  69.1 |  71.4  |

   |**LLM-as-a-Judge**||||||
   |GPT-4o           |-   |50.6 |  54.1 |  75.0 |  59.5 | 59.8 |
   |Claude-3.5-Sonnet|-   |62.3 |  66.3 |  66.1 |  64.3 |  64.8|
+  |DeepSeek-R1-0528 |671B|59.1 |  82.7 |  80.4 |  **92.9** |  78.8|
   |**Open-Source Reward Models**||||||
   |Llama-3.1-Nemotron-70B-Reward | 70B | 62.3 |  72.5 |  76.8 |  57.1 |  67.2|
   |Skywork-Reward-Gemma-2-27B | 27B | 59.7 |  66.3 |  83.9 |  50.0 |  65.0|
   |Nemotron-Super-Multilingual | 49B | 64.9 |  74.5 |  87.5 |  73.8 |  75.2|
   |**Reasoning Reward Models**||||||
   |RM-R1-Distilled-Qwen-32B  | 32B   | 76.0 |  80.6 |  88.1 |  70.5 |  78.8 |
+  |RM-R1-Distilled-Qwen-14B  | 14B   |  68.1  |  72.4  |  87.8  |  84.2  |  78.1 |
   |RRM-32B      | 32B    |  79.9  |  70.4  |  87.5  |  65.0  |  75.7 |
   |**Training with Unlabeled Preference Data**||||||
   |GRAM-Qwen3-14B    | 14B  | 63.0 |  64.3 |  **89.3** |  69.1 |  71.4  |