nielsr HF Staff commited on
Commit
c0932f1
·
verified ·
1 Parent(s): 0fae468

Update pipeline tag and add library name

Browse files

This PR updates the `pipeline_tag` from `text-classification` to `text-ranking`, which is more accurate for a reward model and ensures proper discoverability on the Hugging Face Hub (https://huggingface.co/models?pipeline_tag=text-ranking). It also adds the `library_name: transformers` metadata, indicating compatibility with the Hugging Face Transformers library and enabling the "how to use" button on the model page.

Files changed (1) hide show
  1. README.md +54 -44
README.md CHANGED
@@ -1,9 +1,11 @@
1
  ---
2
- license: llama3.1
3
  base_model:
4
  - meta-llama/Llama-3.1-8B-Instruct
5
- pipeline_tag: text-classification
 
 
6
  ---
 
7
  # Skywork-Reward-V2
8
 
9
  <div align="center">
@@ -35,16 +37,16 @@ pipeline_tag: text-classification
35
 
36
  <div align="center">
37
 
38
- | Model | Base Model | Link |
39
- |:-----------------------------------|:--------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------:|
40
- | Skywork-Reward-V2-Llama-3.1-8B | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.1-8B) |
41
  | Skywork-Reward-V2-Llama-3.1-8B-40M | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.1-8B-40M) |
42
- | Skywork-Reward-V2-Llama-3.2-1B | [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.2-1B) |
43
- | Skywork-Reward-V2-Llama-3.2-3B | [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.2-3B) |
44
- | Skywork-Reward-V2-Qwen3-0.6B | [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-0.6B) |
45
- | Skywork-Reward-V2-Qwen3-1.7B | [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-1.7B) |
46
- | Skywork-Reward-V2-Qwen3-4B | [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-4B) |
47
- | Skywork-Reward-V2-Qwen3-8B | [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-8B) |
48
 
49
  </div>
50
 
@@ -54,35 +56,35 @@ For the complete collection of models, please refer to the [Skywork-Reward-V2](h
54
 
55
  In the following table, we categorize the models into two types: Bradley-Terry (BT) reward models and Generative reward models. The Skywork-Reward-V2 series outperforms models in both categories with much smaller model sizes.
56
 
57
- | Category | Model | RewardBench v1 | RewardBench v2 | PPE Preference | PPE Correctness | RMB | RM-Bench | JudgeBench | Avg. |
58
- |:-----------------:|:---------------------------------------|:--------------:|:--------------:|:--------------:|:---------------:|:--------:|:--------:|:----------:|:--------:|
59
- | **Bradley-Terry** | Llama-3-OffsetBias-RM-8B | 89.0 | 64.8 | 59.2 | 64.1 | 57.8 | 71.3 | 63.5 | 67.1 |
60
- | | ArmoRM-Llama3-8B-v0.1 | 90.4 | 66.5 | 60.6 | 60.6 | 64.6 | 69.3 | 59.7 | 67.4 |
61
- | | Internlm2-20b-reward | 90.2 | 56.3 | 61.0 | 63.0 | 62.9 | 68.3 | 64.3 | 66.6 |
62
- | | Skywork-Reward-Llama-3.1-8B-v0.2 | 93.1 | 71.8 | 62.2 | 62.5 | 66.6 | 72.1 | 62.9 | 70.2 |
63
- | | LDL-Reward-Gemma-2-27B-v0.1 | 95.0 | 72.5 | 62.4 | 63.9 | 67.9 | 71.1 | 64.2 | 71.0 |
64
- | | Skywork-Reward-Gemma-2-27B-v0.2 | 94.3 | 75.3 | 63.6 | 61.9 | 69.4 | 70.0 | 66.5 | 71.6 |
65
- | | INF-ORM-Llama3.1-70B | 95.1 | 76.5 | 64.2 | 64.4 | 70.5 | 73.8 | 70.2 | 73.5 |
66
- | **Generative** | GPT-4o | 86.7 | 64.9 | 67.7 | - | 73.8 | - | 59.8 | - |
67
- | | Claude-3.5-Sonnet | 84.2 | 64.7 | 67.3 | - | 70.6 | - | 64.8 | - |
68
- | | DeepSeek-GRM-27B | 88.5 | - | 65.3 | 60.4 | 69.0 | - | - | - |
69
- | | DeepSeek-GRM-27B (w/ MetaRM) | 90.4 | - | 67.2 | 63.2 | 70.3 | - | - | - |
70
- | | RM-R1-Qwen-Instruct-32B | 92.9 | - | - | - | 73.0 | 79.1 | - | - |
71
- | | RM-R1-DeepSeek-Distill-Qwen-32B | 90.9 | - | - | - | 69.8 | 83.9 | - | - |
72
- | | EvalPlanner (Llama-3.1-70B) | 93.9 | - | - | - | - | 80.0 | 50.9 | - |
73
- | | EvalPlanner (Llama-3.3-70B) | 93.8 | - | - | - | - | 82.1 | 56.6 | - |
74
- | | J1-Llama-8B | 85.7 | - | 60.3 | 59.2 | - | 73.4 | 42.0 | - |
75
- | | J1-Llama-8B (Maj@32) | - | - | 60.6 | 61.9 | - | - | - | - |
76
- | | J1-Llama-70B | 93.3 | - | 66.3 | 72.9 | - | 82.7 | 60.0 | - |
77
- | | J1-Llama-70B (Maj@32) | - | - | 67.0 | 73.7 | - | - | - | - |
78
- | **Bradley-Terry** | **Skywork-Reward-V2-Qwen3-0.6B** | 85.2 | 61.3 | 65.3 | 68.3 | 74.5 | 74.4 | 67.6 | 70.9 |
79
- | | **Skywork-Reward-V2-Qwen3-1.7B** | 90.3 | 68.3 | 67.6 | 70.5 | 78.1 | 78.7 | 72.9 | 75.2 |
80
- | | **Skywork-Reward-V2-Qwen3-4B** | 93.4 | 75.5 | 69.5 | 74.7 | 80.6 | 81.6 | 69.3 | 77.8 |
81
- | | **Skywork-Reward-V2-Qwen3-8B** | 93.7 | 78.2 | 70.6 | 75.1 | 81.2 | 82.6 | 73.4 | 79.3 |
82
- | | **Skywork-Reward-V2-Llama-3.2-1B** | 89.9 | 64.3 | 66.6 | 67.4 | 76.7 | 76.4 | 65.0 | 72.3 |
83
- | | **Skywork-Reward-V2-Llama-3.2-3B** | 93.0 | 74.7 | 69.1 | 72.1 | 80.5 | 81.1 | 69.2 | 77.1 |
84
- | | **Skywork-Reward-V2-Llama-3.1-8B** | 96.4 | 84.1 | 77.3 | 83.4 | 86.4 | 92.8 | 80.0 | 85.8 |
85
- | | **Skywork-Reward-V2-Llama-3.1-8B-40M** | **97.8** | **86.5** | **79.8** | **87.2** | **89.3** | **96.0** | **83.4** | **88.6** |
86
 
87
  ## 💡 Recommended Usage
88
 
@@ -121,8 +123,12 @@ rm = AutoModelForSequenceClassification.from_pretrained(
121
  tokenizer = AutoTokenizer.from_pretrained(model_name)
122
 
123
  prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
124
- response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
125
- response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
 
 
 
 
126
 
127
  conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
128
  conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
@@ -215,8 +221,12 @@ def process_convs(convs, base_url, tokenizer, model_name_or_path):
215
 
216
 
217
  prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
218
- response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
219
- response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
 
 
 
 
220
 
221
  conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
222
  conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
 
1
  ---
 
2
  base_model:
3
  - meta-llama/Llama-3.1-8B-Instruct
4
+ license: llama3.1
5
+ pipeline_tag: text-ranking
6
+ library_name: transformers
7
  ---
8
+
9
  # Skywork-Reward-V2
10
 
11
  <div align="center">
 
37
 
38
  <div align="center">
39
 
40
+ | Model | Base Model | Link |
41
+ |:---|:---|:---:|
42
+ | Skywork-Reward-V2-Llama-3.1-8B | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.1-8B) |
43
  | Skywork-Reward-V2-Llama-3.1-8B-40M | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.1-8B-40M) |
44
+ | Skywork-Reward-V2-Llama-3.2-1B | [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.2-1B) |
45
+ | Skywork-Reward-V2-Llama-3.2-3B | [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.2-3B) |
46
+ | Skywork-Reward-V2-Qwen3-0.6B | [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-0.6B) |
47
+ | Skywork-Reward-V2-Qwen3-1.7B | [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-1.7B) |
48
+ | Skywork-Reward-V2-Qwen3-4B | [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-4B) |
49
+ | Skywork-Reward-V2-Qwen3-8B | [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-8B) |
50
 
51
  </div>
52
 
 
56
 
57
  In the following table, we categorize the models into two types: Bradley-Terry (BT) reward models and Generative reward models. The Skywork-Reward-V2 series outperforms models in both categories with much smaller model sizes.
58
 
59
+ | Category | Model | RewardBench v1 | RewardBench v2 | PPE Preference | PPE Correctness | RMB | RM-Bench | JudgeBench | Avg. |
60
+ |:---|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
61
+ | **Bradley-Terry** | Llama-3-OffsetBias-RM-8B | 89.0 | 64.8 | 59.2 | 64.1 | 57.8 | 71.3 | 63.5 | 67.1 |
62
+ | | ArmoRM-Llama3-8B-v0.1 | 90.4 | 66.5 | 60.6 | 60.6 | 64.6 | 69.3 | 59.7 | 67.4 |
63
+ | | Internlm2-20b-reward | 90.2 | 56.3 | 61.0 | 63.0 | 62.9 | 68.3 | 64.3 | 66.6 |
64
+ | | Skywork-Reward-Llama-3.1-8B-v0.2 | 93.1 | 71.8 | 62.2 | 62.5 | 66.6 | 72.1 | 62.9 | 70.2 |
65
+ | | LDL-Reward-Gemma-2-27B-v0.1 | 95.0 | 72.5 | 62.4 | 63.9 | 67.9 | 71.1 | 64.2 | 71.0 |
66
+ | | Skywork-Reward-Gemma-2-27B-v0.2 | 94.3 | 75.3 | 63.6 | 61.9 | 69.4 | 70.0 | 66.5 | 71.6 |
67
+ | | INF-ORM-Llama3.1-70B | 95.1 | 76.5 | 64.2 | 64.4 | 70.5 | 73.8 | 70.2 | 73.5 |
68
+ | **Generative** | GPT-4o | 86.7 | 64.9 | 67.7 | - | 73.8 | - | 59.8 | - |
69
+ | | Claude-3.5-Sonnet | 84.2 | 64.7 | 67.3 | - | 70.6 | - | 64.8 | - |
70
+ | | DeepSeek-GRM-27B | 88.5 | - | 65.3 | 60.4 | 69.0 | - | - | - |
71
+ | | DeepSeek-GRM-27B (w/ MetaRM) | 90.4 | - | 67.2 | 63.2 | 70.3 | - | - | - |
72
+ | | RM-R1-Qwen-Instruct-32B | 92.9 | - | - | - | 73.0 | 79.1 | - | - |
73
+ | | RM-R1-DeepSeek-Distill-Qwen-32B | 90.9 | - | - | - | 69.8 | 83.9 | - | - |
74
+ | | EvalPlanner (Llama-3.1-70B) | 93.9 | - | - | - | - | 80.0 | 50.9 | - |
75
+ | | EvalPlanner (Llama-3.3-70B) | 93.8 | - | - | - | - | 82.1 | 56.6 | - |
76
+ | | J1-Llama-8B | 85.7 | - | 60.3 | 59.2 | - | 73.4 | 42.0 | - |
77
+ | | J1-Llama-8B (Maj@32) | - | - | 60.6 | 61.9 | - | - | - | - |
78
+ | | J1-Llama-70B | 93.3 | - | 66.3 | 72.9 | - | 82.7 | 60.0 | - |
79
+ | | J1-Llama-70B (Maj@32) | - | - | 67.0 | 73.7 | - | - | - | - |
80
+ | **Bradley-Terry** | **Skywork-Reward-V2-Qwen3-0.6B** | 85.2 | 61.3 | 65.3 | 68.3 | 74.5 | 74.4 | 67.6 | 70.9 |
81
+ | | **Skywork-Reward-V2-Qwen3-1.7B** | 90.3 | 68.3 | 67.6 | 70.5 | 78.1 | 78.7 | 72.9 | 75.2 |
82
+ | | **Skywork-Reward-V2-Qwen3-4B** | 93.4 | 75.5 | 69.5 | 74.7 | 80.6 | 81.6 | 69.3 | 77.8 |
83
+ | | **Skywork-Reward-V2-Qwen3-8B** | 93.7 | 78.2 | 70.6 | 75.1 | 81.2 | 82.6 | 73.4 | 79.3 |
84
+ | | **Skywork-Reward-V2-Llama-3.2-1B** | 89.9 | 64.3 | 66.6 | 67.4 | 76.7 | 76.4 | 65.0 | 72.3 |
85
+ | | **Skywork-Reward-V2-Llama-3.2-3B** | 93.0 | 74.7 | 69.1 | 72.1 | 80.5 | 81.1 | 69.2 | 77.1 |
86
+ | | **Skywork-Reward-V2-Llama-3.1-8B** | 96.4 | 84.1 | 77.3 | 83.4 | 86.4 | 92.8 | 80.0 | 85.8 |
87
+ | | **Skywork-Reward-V2-Llama-3.1-8B-40M** | **97.8** | **86.5** | **79.8** | **87.2** | **89.3** | **96.0** | **83.4** | **88.6** |
88
 
89
  ## 💡 Recommended Usage
90
 
 
123
  tokenizer = AutoTokenizer.from_pretrained(model_name)
124
 
125
  prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
126
+ response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
127
+ 2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
128
+ 3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
129
+ response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
130
+ 2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
131
+ 3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
132
 
133
  conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
134
  conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
 
221
 
222
 
223
  prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
224
+ response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
225
+ 2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
226
+ 3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
227
+ response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
228
+ 2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
229
+ 3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
230
 
231
  conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
232
  conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]