Update README.md (#1)
Browse files- Update README.md (edaa81c4db93276ed3a81737045e74f4b6b153f7)
Co-authored-by: HongbangYuan <[email protected]>
README.md
CHANGED
@@ -4,4 +4,103 @@ datasets:
|
|
4 |
- jinzhuoran/OmniRewardData
|
5 |
base_model:
|
6 |
- openbmb/MiniCPM-o-2_6
|
7 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
- jinzhuoran/OmniRewardData
|
5 |
base_model:
|
6 |
- openbmb/MiniCPM-o-2_6
|
7 |
+
---
|
8 |
+
|
9 |
+
|
10 |
+
|
11 |
+
# Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
|
12 |
+
|
13 |
+
|
14 |
+
<p align="center">
|
15 |
+
<a href="https://huggingface.co/datasets/HongbangYuan/OmniRewardBench"> π€ Benchmark</a></a> |
|
16 |
+
<a href="https://hf.co/datasets/jinzhuoran/OmniRewardData"> π€ Dataset</a> |
|
17 |
+
<a href="https://hf.co/jinzhuoran/OmniRewardModel"> π€ Model</a> |
|
18 |
+
<a href="https://omnireward.github.io/"> π Homepage</a>
|
19 |
+
</p>
|
20 |
+
|
21 |
+
|
22 |
+
|
23 |
+
## π§© Overview
|
24 |
+
|
25 |
+
**OmniRewardModel** is our pretrained **discriminative reward model** designed to handle *omni-modal* tasks (e.g., text, image, video) and *free-form human preferences*.
|
26 |
+
|
27 |
+
It is built upon the open-source base model [MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6), with an additional **value head** appended to produce scalar reward scores.
|
28 |
+
|
29 |
+
The model supports fine-grained scoring across various tasks and modalities, and can be seamlessly loaded via Hugging Face Hub.
|
30 |
+
|
31 |
+
---
|
32 |
+
|
33 |
+
|
34 |
+
## π οΈ Environment Setup
|
35 |
+
|
36 |
+
|
37 |
+
To reproduce the training process in our paper, please make sure to set up the environment as described below.
|
38 |
+
Our training code is built upon the [llama-factory](https://github.com/hiyouga/llama-factory) framework.
|
39 |
+
|
40 |
+
```bash
|
41 |
+
git clone https://github.com/HongbangYuan/OmniReward.git
|
42 |
+
conda create -n omnireward python=3.10
|
43 |
+
conda activate omnireward
|
44 |
+
```
|
45 |
+
|
46 |
+
We recommend using **`torch==2.2.0`** for best compatibility.
|
47 |
+
|
48 |
+
Install PyTorch (choose one based on your CUDA version):
|
49 |
+
|
50 |
+
```bash
|
51 |
+
# For CUDA 11.8:
|
52 |
+
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 \
|
53 |
+
--index-url https://download.pytorch.org/whl/cu118
|
54 |
+
|
55 |
+
# For CUDA 12.1:
|
56 |
+
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 \
|
57 |
+
--index-url https://download.pytorch.org/whl/cu121
|
58 |
+
```
|
59 |
+
|
60 |
+
Then install the remaining dependencies:
|
61 |
+
|
62 |
+
```bash
|
63 |
+
cd OmniReward/OmniReward-Factory
|
64 |
+
pip install -r requirements.txt
|
65 |
+
```
|
66 |
+
|
67 |
+
## π¦ Data Preparation
|
68 |
+
|
69 |
+
Download all required training and evaluation datasets from [OmniRewardData](https://huggingface.co/datasets/jinzhuoran/OmniRewardData) and [OmniRewardBench](https://huggingface.co/datasets/HongbangYuan/OmniRewardBench):
|
70 |
+
|
71 |
+
```bash
|
72 |
+
cd OmniReward-Factory
|
73 |
+
bash scripts/download.sh
|
74 |
+
```
|
75 |
+
|
76 |
+
## ποΈββοΈ Training Omni-Reward
|
77 |
+
|
78 |
+
To reproduce the training results described in our paper, please navigate to the OmniReward-Factory directory and run the following scripts:
|
79 |
+
|
80 |
+
```bash
|
81 |
+
cd OmniReward-Factory
|
82 |
+
bash scripts/train.sh
|
83 |
+
bash scripts/train_t2t.sh
|
84 |
+
bash scripts/train_ti2t.sh
|
85 |
+
bash scripts/train_t2iv.sh
|
86 |
+
```
|
87 |
+
## π Loading and Evaluating Omni-Reward
|
88 |
+
|
89 |
+
You can also directly use our pretrained Omni-Reward for evaluation without retraining.
|
90 |
+
|
91 |
+
The models are publicly available at:
|
92 |
+
|
93 |
+
π https://huggingface.co/jinzhuoran/OmniRewardModel
|
94 |
+
|
95 |
+
```bash
|
96 |
+
cd OmniReward-Factory
|
97 |
+
bash scripts/eval_t2t.sh
|
98 |
+
bash scripts/eval_t2t_tie.sh
|
99 |
+
bash scripts/eval_ti2t.sh
|
100 |
+
bash scripts/eval_ti2t_tie.sh
|
101 |
+
```
|
102 |
+
|
103 |
+
- `--eval_dataset`: Specifies the evaluation dataset (e.g., `omni_t2t`, `omni_t2i`, `omni_t2v`, etc.).
|
104 |
+
|
105 |
+
- `--eval_tie`: Enables w/ Ties evaluation.
|
106 |
+
|