Update pipeline tag and add library name

This PR updates the `pipeline_tag` from `text-classification` to `text-ranking`, which is more accurate for a reward model and ensures proper discoverability on the Hugging Face Hub (https://huggingface.co/models?pipeline_tag=text-ranking). It also adds the `library_name: transformers` metadata, indicating compatibility with the Hugging Face Transformers library and enabling the "how to use" button on the model page.

Files changed (1) hide show

README.md +54 -44

README.md CHANGED Viewed

@@ -1,9 +1,11 @@
 ---
-license: llama3.1
 base_model:
 - meta-llama/Llama-3.1-8B-Instruct
-pipeline_tag: text-classification
 ---
 # Skywork-Reward-V2
 <div align="center">
@@ -35,16 +37,16 @@ pipeline_tag: text-classification
 <div align="center">
-| Model                              | Base Model                                                                                  |                                         Link                                         |
-|:-----------------------------------|:--------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------:|
-| Skywork-Reward-V2-Llama-3.1-8B     | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) |   [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.1-8B)   |
 | Skywork-Reward-V2-Llama-3.1-8B-40M | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.1-8B-40M) |
-| Skywork-Reward-V2-Llama-3.2-1B     | [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) |   [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.2-1B)   |
-| Skywork-Reward-V2-Llama-3.2-3B     | [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) |   [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.2-3B)   |
-| Skywork-Reward-V2-Qwen3-0.6B       | [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)                                   |    [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-0.6B)    |
-| Skywork-Reward-V2-Qwen3-1.7B       | [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B)                                   |    [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-1.7B)    |
-| Skywork-Reward-V2-Qwen3-4B         | [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B)                                       |     [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-4B)     |
-| Skywork-Reward-V2-Qwen3-8B         | [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)                                       |     [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-8B)     |
 </div>
@@ -54,35 +56,35 @@ For the complete collection of models, please refer to the [Skywork-Reward-V2](h
 In the following table, we categorize the models into two types: Bradley-Terry (BT) reward models and Generative reward models. The Skywork-Reward-V2 series outperforms models in both categories with much smaller model sizes.
-|     Category      | Model                                  | RewardBench v1 | RewardBench v2 | PPE Preference | PPE Correctness |   RMB    | RM-Bench | JudgeBench |   Avg.   |
-|:-----------------:|:---------------------------------------|:--------------:|:--------------:|:--------------:|:---------------:|:--------:|:--------:|:----------:|:--------:|
-| **Bradley-Terry** | Llama-3-OffsetBias-RM-8B               |      89.0      |      64.8      |      59.2      |      64.1       |   57.8   |   71.3   |    63.5    |   67.1   |
-|                   | ArmoRM-Llama3-8B-v0.1                  |      90.4      |      66.5      |      60.6      |      60.6       |   64.6   |   69.3   |    59.7    |   67.4   |
-|                   | Internlm2-20b-reward                   |      90.2      |      56.3      |      61.0      |      63.0       |   62.9   |   68.3   |    64.3    |   66.6   |
-|                   | Skywork-Reward-Llama-3.1-8B-v0.2       |      93.1      |      71.8      |      62.2      |      62.5       |   66.6   |   72.1   |    62.9    |   70.2   |
-|                   | LDL-Reward-Gemma-2-27B-v0.1            |      95.0      |      72.5      |      62.4      |      63.9       |   67.9   |   71.1   |    64.2    |   71.0   |
-|                   | Skywork-Reward-Gemma-2-27B-v0.2        |      94.3      |      75.3      |      63.6      |      61.9       |   69.4   |   70.0   |    66.5    |   71.6   |
-|                   | INF-ORM-Llama3.1-70B                   |      95.1      |      76.5      |      64.2      |      64.4       |   70.5   |   73.8   |    70.2    |   73.5   |
-|  **Generative**   | GPT-4o                                 |      86.7      |      64.9      |      67.7      |        -        |   73.8   |    -     |    59.8    |    -     |
-|                   | Claude-3.5-Sonnet                      |      84.2      |      64.7      |      67.3      |        -        |   70.6   |    -     |    64.8    |    -     |
-|                   | DeepSeek-GRM-27B                       |      88.5      |       -        |      65.3      |      60.4       |   69.0   |    -     |     -      |    -     |
-|                   | DeepSeek-GRM-27B (w/ MetaRM)           |      90.4      |       -        |      67.2      |      63.2       |   70.3   |    -     |     -      |    -     |
-|                   | RM-R1-Qwen-Instruct-32B                |      92.9      |       -        |       -        |        -        |   73.0   |   79.1   |     -      |    -     |
-|                   | RM-R1-DeepSeek-Distill-Qwen-32B        |      90.9      |       -        |       -        |        -        |   69.8   |   83.9   |     -      |    -     |
-|                   | EvalPlanner (Llama-3.1-70B)            |      93.9      |       -        |       -        |        -        |    -     |   80.0   |    50.9    |    -     |
-|                   | EvalPlanner (Llama-3.3-70B)            |      93.8      |       -        |       -        |        -        |    -     |   82.1   |    56.6    |    -     |
-|                   | J1-Llama-8B                            |      85.7      |       -        |      60.3      |      59.2       |    -     |   73.4   |    42.0    |    -     |
-|                   | J1-Llama-8B (Maj@32)                   |       -        |       -        |      60.6      |      61.9       |    -     |    -     |     -      |    -     |
-|                   | J1-Llama-70B                           |      93.3      |       -        |      66.3      |      72.9       |    -     |   82.7   |    60.0    |    -     |
-|                   | J1-Llama-70B (Maj@32)                  |       -        |       -        |      67.0      |      73.7       |    -     |    -     |     -      |    -     |
-| **Bradley-Terry** | **Skywork-Reward-V2-Qwen3-0.6B**       |      85.2      |      61.3      |      65.3      |      68.3       |   74.5   |   74.4   |    67.6    |   70.9   |
-|                   | **Skywork-Reward-V2-Qwen3-1.7B**       |      90.3      |      68.3      |      67.6      |      70.5       |   78.1   |   78.7   |    72.9    |   75.2   |
-|                   | **Skywork-Reward-V2-Qwen3-4B**         |      93.4      |      75.5      |      69.5      |      74.7       |   80.6   |   81.6   |    69.3    |   77.8   |
-|                   | **Skywork-Reward-V2-Qwen3-8B**         |      93.7      |      78.2      |      70.6      |      75.1       |   81.2   |   82.6   |    73.4    |   79.3   |
-|                   | **Skywork-Reward-V2-Llama-3.2-1B**     |      89.9      |      64.3      |      66.6      |      67.4       |   76.7   |   76.4   |    65.0    |   72.3   |
-|                   | **Skywork-Reward-V2-Llama-3.2-3B**     |      93.0      |      74.7      |      69.1      |      72.1       |   80.5   |   81.1   |    69.2    |   77.1   |
-|                   | **Skywork-Reward-V2-Llama-3.1-8B**     |      96.4      |      84.1      |      77.3      |      83.4       |   86.4   |   92.8   |    80.0    |   85.8   |
-|                   | **Skywork-Reward-V2-Llama-3.1-8B-40M** |    **97.8**    |    **86.5**    |    **79.8**    |    **87.2**     | **89.3** | **96.0** |  **83.4**  | **88.6** |
 ## 💡 Recommended Usage
@@ -121,8 +123,12 @@ rm = AutoModelForSequenceClassification.from_pretrained(
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
-response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
-response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
 conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
 conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
@@ -215,8 +221,12 @@ def process_convs(convs, base_url, tokenizer, model_name_or_path):
 prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
-response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
-response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
 conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
 conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]

 ---
 base_model:
 - meta-llama/Llama-3.1-8B-Instruct
+license: llama3.1
+pipeline_tag: text-ranking
+library_name: transformers
 ---
 # Skywork-Reward-V2
 <div align="center">
 <div align="center">
+| Model | Base Model | Link |
+|:---|:---|:---:|
+| Skywork-Reward-V2-Llama-3.1-8B | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.1-8B) |
 | Skywork-Reward-V2-Llama-3.1-8B-40M | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.1-8B-40M) |
+| Skywork-Reward-V2-Llama-3.2-1B | [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.2-1B) |
+| Skywork-Reward-V2-Llama-3.2-3B | [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.2-3B) |
+| Skywork-Reward-V2-Qwen3-0.6B | [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-0.6B) |
+| Skywork-Reward-V2-Qwen3-1.7B | [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-1.7B) |
+| Skywork-Reward-V2-Qwen3-4B | [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-4B) |
+| Skywork-Reward-V2-Qwen3-8B | [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | [🤗 Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-8B) |
 </div>
 In the following table, we categorize the models into two types: Bradley-Terry (BT) reward models and Generative reward models. The Skywork-Reward-V2 series outperforms models in both categories with much smaller model sizes.
+| Category | Model | RewardBench v1 | RewardBench v2 | PPE Preference | PPE Correctness | RMB | RM-Bench | JudgeBench | Avg. |
+|:---|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+| **Bradley-Terry** | Llama-3-OffsetBias-RM-8B | 89.0 | 64.8 | 59.2 | 64.1 | 57.8 | 71.3 | 63.5 | 67.1 |
+| | ArmoRM-Llama3-8B-v0.1 | 90.4 | 66.5 | 60.6 | 60.6 | 64.6 | 69.3 | 59.7 | 67.4 |
+| | Internlm2-20b-reward | 90.2 | 56.3 | 61.0 | 63.0 | 62.9 | 68.3 | 64.3 | 66.6 |
+| | Skywork-Reward-Llama-3.1-8B-v0.2 | 93.1 | 71.8 | 62.2 | 62.5 | 66.6 | 72.1 | 62.9 | 70.2 |
+| | LDL-Reward-Gemma-2-27B-v0.1 | 95.0 | 72.5 | 62.4 | 63.9 | 67.9 | 71.1 | 64.2 | 71.0 |
+| | Skywork-Reward-Gemma-2-27B-v0.2 | 94.3 | 75.3 | 63.6 | 61.9 | 69.4 | 70.0 | 66.5 | 71.6 |
+| | INF-ORM-Llama3.1-70B | 95.1 | 76.5 | 64.2 | 64.4 | 70.5 | 73.8 | 70.2 | 73.5 |
+| **Generative** | GPT-4o | 86.7 | 64.9 | 67.7 | - | 73.8 | - | 59.8 | - |
+| | Claude-3.5-Sonnet | 84.2 | 64.7 | 67.3 | - | 70.6 | - | 64.8 | - |
+| | DeepSeek-GRM-27B | 88.5 | - | 65.3 | 60.4 | 69.0 | - | - | - |
+| | DeepSeek-GRM-27B (w/ MetaRM) | 90.4 | - | 67.2 | 63.2 | 70.3 | - | - | - |
+| | RM-R1-Qwen-Instruct-32B | 92.9 | - | - | - | 73.0 | 79.1 | - | - |
+| | RM-R1-DeepSeek-Distill-Qwen-32B | 90.9 | - | - | - | 69.8 | 83.9 | - | - |
+| | EvalPlanner (Llama-3.1-70B) | 93.9 | - | - | - | - | 80.0 | 50.9 | - |
+| | EvalPlanner (Llama-3.3-70B) | 93.8 | - | - | - | - | 82.1 | 56.6 | - |
+| | J1-Llama-8B | 85.7 | - | 60.3 | 59.2 | - | 73.4 | 42.0 | - |
+| | J1-Llama-8B (Maj@32) | - | - | 60.6 | 61.9 | - | - | - | - |
+| | J1-Llama-70B | 93.3 | - | 66.3 | 72.9 | - | 82.7 | 60.0 | - |
+| | J1-Llama-70B (Maj@32) | - | - | 67.0 | 73.7 | - | - | - | - |
+| **Bradley-Terry** | **Skywork-Reward-V2-Qwen3-0.6B** | 85.2 | 61.3 | 65.3 | 68.3 | 74.5 | 74.4 | 67.6 | 70.9 |
+| | **Skywork-Reward-V2-Qwen3-1.7B** | 90.3 | 68.3 | 67.6 | 70.5 | 78.1 | 78.7 | 72.9 | 75.2 |
+| | **Skywork-Reward-V2-Qwen3-4B** | 93.4 | 75.5 | 69.5 | 74.7 | 80.6 | 81.6 | 69.3 | 77.8 |
+| | **Skywork-Reward-V2-Qwen3-8B** | 93.7 | 78.2 | 70.6 | 75.1 | 81.2 | 82.6 | 73.4 | 79.3 |
+| | **Skywork-Reward-V2-Llama-3.2-1B** | 89.9 | 64.3 | 66.6 | 67.4 | 76.7 | 76.4 | 65.0 | 72.3 |
+| | **Skywork-Reward-V2-Llama-3.2-3B** | 93.0 | 74.7 | 69.1 | 72.1 | 80.5 | 81.1 | 69.2 | 77.1 |
+| | **Skywork-Reward-V2-Llama-3.1-8B** | 96.4 | 84.1 | 77.3 | 83.4 | 86.4 | 92.8 | 80.0 | 85.8 |
+| | **Skywork-Reward-V2-Llama-3.1-8B-40M** | **97.8** | **86.5** | **79.8** | **87.2** | **89.3** | **96.0** | **83.4** | **88.6** |
 ## 💡 Recommended Usage
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
+response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
+2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
+3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
+response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
+2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
+3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
 conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
 conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
 prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
+response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
+2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
+3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
+response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
+2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
+3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
 conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
 conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]