File size: 3,111 Bytes
00162e0
 
 
 
 
 
269b286
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
license: cc-by-nc-4.0
datasets:
- jinzhuoran/OmniRewardData
base_model:
- openbmb/MiniCPM-o-2_6
---



# Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences


<p align="center">
  <a href="https://huggingface.co/datasets/HongbangYuan/OmniRewardBench"> πŸ€— Benchmark</a></a> |
  <a href="https://hf.co/datasets/jinzhuoran/OmniRewardData"> πŸ€— Dataset</a> | 
  <a href="https://hf.co/jinzhuoran/OmniRewardModel"> πŸ€— Model</a> | 
  <a href="https://omnireward.github.io/"> 🏠 Homepage</a>
</p>



## 🧩 Overview

**OmniRewardModel** is our pretrained **discriminative reward model** designed to handle *omni-modal* tasks (e.g., text, image, video) and *free-form human preferences*.

It is built upon the open-source base model [MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6), with an additional **value head** appended to produce scalar reward scores.

The model supports fine-grained scoring across various tasks and modalities, and can be seamlessly loaded via Hugging Face Hub.

---


## πŸ› οΈ Environment Setup


To reproduce the training process in our paper, please make sure to set up the environment as described below.
Our training code is built upon the [llama-factory](https://github.com/hiyouga/llama-factory)  framework.

```bash
git clone https://github.com/HongbangYuan/OmniReward.git
conda create -n omnireward python=3.10
conda activate omnireward
```

We recommend using **`torch==2.2.0`** for best compatibility.

Install PyTorch (choose one based on your CUDA version):

```bash
# For CUDA 11.8:
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 \
    --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1:
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 \
    --index-url https://download.pytorch.org/whl/cu121
```

Then install the remaining dependencies:

```bash
cd OmniReward/OmniReward-Factory
pip install -r requirements.txt
```

## πŸ“¦ Data Preparation

Download all required training and evaluation datasets from [OmniRewardData](https://huggingface.co/datasets/jinzhuoran/OmniRewardData) and [OmniRewardBench](https://huggingface.co/datasets/HongbangYuan/OmniRewardBench):

```bash
cd OmniReward-Factory
bash scripts/download.sh
```

## πŸ‹οΈβ€β™€οΈ  Training Omni-Reward

To reproduce the training results described in our paper, please navigate to the OmniReward-Factory directory and run the following scripts:

```bash
cd OmniReward-Factory
bash scripts/train.sh
bash scripts/train_t2t.sh
bash scripts/train_ti2t.sh
bash scripts/train_t2iv.sh
```
## πŸ“ˆ  Loading and Evaluating Omni-Reward

You can also directly use our pretrained Omni-Reward for evaluation without retraining.

The models are publicly available at:

πŸ‘‰ https://huggingface.co/jinzhuoran/OmniRewardModel

```bash
cd OmniReward-Factory
bash scripts/eval_t2t.sh
bash scripts/eval_t2t_tie.sh
bash scripts/eval_ti2t.sh
bash scripts/eval_ti2t_tie.sh
```

- `--eval_dataset`: Specifies the evaluation dataset (e.g., `omni_t2t`, `omni_t2i`, `omni_t2v`, etc.).

- `--eval_tie`: Enables w/ Ties evaluation.