File size: 3,111 Bytes
00162e0 269b286 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
---
license: cc-by-nc-4.0
datasets:
- jinzhuoran/OmniRewardData
base_model:
- openbmb/MiniCPM-o-2_6
---
# Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
<p align="center">
<a href="https://huggingface.co/datasets/HongbangYuan/OmniRewardBench"> π€ Benchmark</a></a> |
<a href="https://hf.co/datasets/jinzhuoran/OmniRewardData"> π€ Dataset</a> |
<a href="https://hf.co/jinzhuoran/OmniRewardModel"> π€ Model</a> |
<a href="https://omnireward.github.io/"> π Homepage</a>
</p>
## π§© Overview
**OmniRewardModel** is our pretrained **discriminative reward model** designed to handle *omni-modal* tasks (e.g., text, image, video) and *free-form human preferences*.
It is built upon the open-source base model [MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6), with an additional **value head** appended to produce scalar reward scores.
The model supports fine-grained scoring across various tasks and modalities, and can be seamlessly loaded via Hugging Face Hub.
---
## π οΈ Environment Setup
To reproduce the training process in our paper, please make sure to set up the environment as described below.
Our training code is built upon the [llama-factory](https://github.com/hiyouga/llama-factory) framework.
```bash
git clone https://github.com/HongbangYuan/OmniReward.git
conda create -n omnireward python=3.10
conda activate omnireward
```
We recommend using **`torch==2.2.0`** for best compatibility.
Install PyTorch (choose one based on your CUDA version):
```bash
# For CUDA 11.8:
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 \
--index-url https://download.pytorch.org/whl/cu118
# For CUDA 12.1:
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 \
--index-url https://download.pytorch.org/whl/cu121
```
Then install the remaining dependencies:
```bash
cd OmniReward/OmniReward-Factory
pip install -r requirements.txt
```
## π¦ Data Preparation
Download all required training and evaluation datasets from [OmniRewardData](https://huggingface.co/datasets/jinzhuoran/OmniRewardData) and [OmniRewardBench](https://huggingface.co/datasets/HongbangYuan/OmniRewardBench):
```bash
cd OmniReward-Factory
bash scripts/download.sh
```
## ποΈββοΈ Training Omni-Reward
To reproduce the training results described in our paper, please navigate to the OmniReward-Factory directory and run the following scripts:
```bash
cd OmniReward-Factory
bash scripts/train.sh
bash scripts/train_t2t.sh
bash scripts/train_ti2t.sh
bash scripts/train_t2iv.sh
```
## π Loading and Evaluating Omni-Reward
You can also directly use our pretrained Omni-Reward for evaluation without retraining.
The models are publicly available at:
π https://huggingface.co/jinzhuoran/OmniRewardModel
```bash
cd OmniReward-Factory
bash scripts/eval_t2t.sh
bash scripts/eval_t2t_tie.sh
bash scripts/eval_ti2t.sh
bash scripts/eval_ti2t_tie.sh
```
- `--eval_dataset`: Specifies the evaluation dataset (e.g., `omni_t2t`, `omni_t2i`, `omni_t2v`, etc.).
- `--eval_tie`: Enables w/ Ties evaluation.
|