Update pipeline tag and add library name
Browse filesThis PR updates the `pipeline_tag` from `text-classification` to `text-ranking`, which is more accurate for a reward model and ensures proper discoverability on the Hugging Face Hub (https://huggingface.co/models?pipeline_tag=text-ranking). It also adds the `library_name: transformers` metadata, indicating compatibility with the Hugging Face Transformers library and enabling the "how to use" button on the model page.
README.md
CHANGED
@@ -1,9 +1,11 @@
|
|
1 |
---
|
2 |
-
license: llama3.1
|
3 |
base_model:
|
4 |
- meta-llama/Llama-3.1-8B-Instruct
|
5 |
-
|
|
|
|
|
6 |
---
|
|
|
7 |
# Skywork-Reward-V2
|
8 |
|
9 |
<div align="center">
|
@@ -35,16 +37,16 @@ pipeline_tag: text-classification
|
|
35 |
|
36 |
<div align="center">
|
37 |
|
38 |
-
| Model
|
39 |
-
|
40 |
-
| Skywork-Reward-V2-Llama-3.1-8B
|
41 |
| Skywork-Reward-V2-Llama-3.1-8B-40M | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.1-8B-40M) |
|
42 |
-
| Skywork-Reward-V2-Llama-3.2-1B
|
43 |
-
| Skywork-Reward-V2-Llama-3.2-3B
|
44 |
-
| Skywork-Reward-V2-Qwen3-0.6B
|
45 |
-
| Skywork-Reward-V2-Qwen3-1.7B
|
46 |
-
| Skywork-Reward-V2-Qwen3-4B
|
47 |
-
| Skywork-Reward-V2-Qwen3-8B
|
48 |
|
49 |
</div>
|
50 |
|
@@ -54,35 +56,35 @@ For the complete collection of models, please refer to the [Skywork-Reward-V2](h
|
|
54 |
|
55 |
In the following table, we categorize the models into two types: Bradley-Terry (BT) reward models and Generative reward models. The Skywork-Reward-V2 series outperforms models in both categories with much smaller model sizes.
|
56 |
|
57 |
-
|
|
58 |
-
|
59 |
-
| **Bradley-Terry** | Llama-3-OffsetBias-RM-8B
|
60 |
-
|
|
61 |
-
|
|
62 |
-
|
|
63 |
-
|
|
64 |
-
|
|
65 |
-
|
|
66 |
-
|
|
67 |
-
|
|
68 |
-
|
|
69 |
-
|
|
70 |
-
|
|
71 |
-
|
|
72 |
-
|
|
73 |
-
|
|
74 |
-
|
|
75 |
-
|
|
76 |
-
|
|
77 |
-
|
|
78 |
-
| **Bradley-Terry** | **Skywork-Reward-V2-Qwen3-0.6B**
|
79 |
-
|
|
80 |
-
|
|
81 |
-
|
|
82 |
-
|
|
83 |
-
|
|
84 |
-
|
|
85 |
-
|
|
86 |
|
87 |
## 💡 Recommended Usage
|
88 |
|
@@ -121,8 +123,12 @@ rm = AutoModelForSequenceClassification.from_pretrained(
|
|
121 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
122 |
|
123 |
prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
|
124 |
-
response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples
|
125 |
-
|
|
|
|
|
|
|
|
|
126 |
|
127 |
conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
|
128 |
conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
|
@@ -215,8 +221,12 @@ def process_convs(convs, base_url, tokenizer, model_name_or_path):
|
|
215 |
|
216 |
|
217 |
prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
|
218 |
-
response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples
|
219 |
-
|
|
|
|
|
|
|
|
|
220 |
|
221 |
conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
|
222 |
conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
|
|
|
1 |
---
|
|
|
2 |
base_model:
|
3 |
- meta-llama/Llama-3.1-8B-Instruct
|
4 |
+
license: llama3.1
|
5 |
+
pipeline_tag: text-ranking
|
6 |
+
library_name: transformers
|
7 |
---
|
8 |
+
|
9 |
# Skywork-Reward-V2
|
10 |
|
11 |
<div align="center">
|
|
|
37 |
|
38 |
<div align="center">
|
39 |
|
40 |
+
| Model | Base Model | Link |
|
41 |
+
|:---|:---|:---:|
|
42 |
+
| Skywork-Reward-V2-Llama-3.1-8B | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.1-8B) |
|
43 |
| Skywork-Reward-V2-Llama-3.1-8B-40M | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.1-8B-40M) |
|
44 |
+
| Skywork-Reward-V2-Llama-3.2-1B | [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.2-1B) |
|
45 |
+
| Skywork-Reward-V2-Llama-3.2-3B | [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.2-3B) |
|
46 |
+
| Skywork-Reward-V2-Qwen3-0.6B | [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-0.6B) |
|
47 |
+
| Skywork-Reward-V2-Qwen3-1.7B | [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-1.7B) |
|
48 |
+
| Skywork-Reward-V2-Qwen3-4B | [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-4B) |
|
49 |
+
| Skywork-Reward-V2-Qwen3-8B | [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-8B) |
|
50 |
|
51 |
</div>
|
52 |
|
|
|
56 |
|
57 |
In the following table, we categorize the models into two types: Bradley-Terry (BT) reward models and Generative reward models. The Skywork-Reward-V2 series outperforms models in both categories with much smaller model sizes.
|
58 |
|
59 |
+
| Category | Model | RewardBench v1 | RewardBench v2 | PPE Preference | PPE Correctness | RMB | RM-Bench | JudgeBench | Avg. |
|
60 |
+
|:---|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
61 |
+
| **Bradley-Terry** | Llama-3-OffsetBias-RM-8B | 89.0 | 64.8 | 59.2 | 64.1 | 57.8 | 71.3 | 63.5 | 67.1 |
|
62 |
+
| | ArmoRM-Llama3-8B-v0.1 | 90.4 | 66.5 | 60.6 | 60.6 | 64.6 | 69.3 | 59.7 | 67.4 |
|
63 |
+
| | Internlm2-20b-reward | 90.2 | 56.3 | 61.0 | 63.0 | 62.9 | 68.3 | 64.3 | 66.6 |
|
64 |
+
| | Skywork-Reward-Llama-3.1-8B-v0.2 | 93.1 | 71.8 | 62.2 | 62.5 | 66.6 | 72.1 | 62.9 | 70.2 |
|
65 |
+
| | LDL-Reward-Gemma-2-27B-v0.1 | 95.0 | 72.5 | 62.4 | 63.9 | 67.9 | 71.1 | 64.2 | 71.0 |
|
66 |
+
| | Skywork-Reward-Gemma-2-27B-v0.2 | 94.3 | 75.3 | 63.6 | 61.9 | 69.4 | 70.0 | 66.5 | 71.6 |
|
67 |
+
| | INF-ORM-Llama3.1-70B | 95.1 | 76.5 | 64.2 | 64.4 | 70.5 | 73.8 | 70.2 | 73.5 |
|
68 |
+
| **Generative** | GPT-4o | 86.7 | 64.9 | 67.7 | - | 73.8 | - | 59.8 | - |
|
69 |
+
| | Claude-3.5-Sonnet | 84.2 | 64.7 | 67.3 | - | 70.6 | - | 64.8 | - |
|
70 |
+
| | DeepSeek-GRM-27B | 88.5 | - | 65.3 | 60.4 | 69.0 | - | - | - |
|
71 |
+
| | DeepSeek-GRM-27B (w/ MetaRM) | 90.4 | - | 67.2 | 63.2 | 70.3 | - | - | - |
|
72 |
+
| | RM-R1-Qwen-Instruct-32B | 92.9 | - | - | - | 73.0 | 79.1 | - | - |
|
73 |
+
| | RM-R1-DeepSeek-Distill-Qwen-32B | 90.9 | - | - | - | 69.8 | 83.9 | - | - |
|
74 |
+
| | EvalPlanner (Llama-3.1-70B) | 93.9 | - | - | - | - | 80.0 | 50.9 | - |
|
75 |
+
| | EvalPlanner (Llama-3.3-70B) | 93.8 | - | - | - | - | 82.1 | 56.6 | - |
|
76 |
+
| | J1-Llama-8B | 85.7 | - | 60.3 | 59.2 | - | 73.4 | 42.0 | - |
|
77 |
+
| | J1-Llama-8B (Maj@32) | - | - | 60.6 | 61.9 | - | - | - | - |
|
78 |
+
| | J1-Llama-70B | 93.3 | - | 66.3 | 72.9 | - | 82.7 | 60.0 | - |
|
79 |
+
| | J1-Llama-70B (Maj@32) | - | - | 67.0 | 73.7 | - | - | - | - |
|
80 |
+
| **Bradley-Terry** | **Skywork-Reward-V2-Qwen3-0.6B** | 85.2 | 61.3 | 65.3 | 68.3 | 74.5 | 74.4 | 67.6 | 70.9 |
|
81 |
+
| | **Skywork-Reward-V2-Qwen3-1.7B** | 90.3 | 68.3 | 67.6 | 70.5 | 78.1 | 78.7 | 72.9 | 75.2 |
|
82 |
+
| | **Skywork-Reward-V2-Qwen3-4B** | 93.4 | 75.5 | 69.5 | 74.7 | 80.6 | 81.6 | 69.3 | 77.8 |
|
83 |
+
| | **Skywork-Reward-V2-Qwen3-8B** | 93.7 | 78.2 | 70.6 | 75.1 | 81.2 | 82.6 | 73.4 | 79.3 |
|
84 |
+
| | **Skywork-Reward-V2-Llama-3.2-1B** | 89.9 | 64.3 | 66.6 | 67.4 | 76.7 | 76.4 | 65.0 | 72.3 |
|
85 |
+
| | **Skywork-Reward-V2-Llama-3.2-3B** | 93.0 | 74.7 | 69.1 | 72.1 | 80.5 | 81.1 | 69.2 | 77.1 |
|
86 |
+
| | **Skywork-Reward-V2-Llama-3.1-8B** | 96.4 | 84.1 | 77.3 | 83.4 | 86.4 | 92.8 | 80.0 | 85.8 |
|
87 |
+
| | **Skywork-Reward-V2-Llama-3.1-8B-40M** | **97.8** | **86.5** | **79.8** | **87.2** | **89.3** | **96.0** | **83.4** | **88.6** |
|
88 |
|
89 |
## 💡 Recommended Usage
|
90 |
|
|
|
123 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
124 |
|
125 |
prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
|
126 |
+
response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
|
127 |
+
2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
|
128 |
+
3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
|
129 |
+
response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
|
130 |
+
2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
|
131 |
+
3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
|
132 |
|
133 |
conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
|
134 |
conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
|
|
|
221 |
|
222 |
|
223 |
prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
|
224 |
+
response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
|
225 |
+
2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
|
226 |
+
3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
|
227 |
+
response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
|
228 |
+
2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
|
229 |
+
3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
|
230 |
|
231 |
conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
|
232 |
conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
|