|
--- |
|
license: cc-by-nc-4.0 |
|
datasets: |
|
- jinzhuoran/OmniRewardData |
|
base_model: |
|
- openbmb/MiniCPM-o-2_6 |
|
--- |
|
|
|
|
|
|
|
# Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences |
|
|
|
|
|
<p align="center"> |
|
<a href="https://huggingface.co/datasets/HongbangYuan/OmniRewardBench"> π€ Benchmark</a></a> | |
|
<a href="https://hf.co/datasets/jinzhuoran/OmniRewardData"> π€ Dataset</a> | |
|
<a href="https://hf.co/jinzhuoran/OmniRewardModel"> π€ Model</a> | |
|
<a href="https://omnireward.github.io/"> π Homepage</a> |
|
</p> |
|
|
|
|
|
|
|
## π§© Overview |
|
|
|
**OmniRewardModel** is our pretrained **discriminative reward model** designed to handle *omni-modal* tasks (e.g., text, image, video) and *free-form human preferences*. |
|
|
|
It is built upon the open-source base model [MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6), with an additional **value head** appended to produce scalar reward scores. |
|
|
|
The model supports fine-grained scoring across various tasks and modalities, and can be seamlessly loaded via Hugging Face Hub. |
|
|
|
--- |
|
|
|
|
|
## π οΈ Environment Setup |
|
|
|
|
|
To reproduce the training process in our paper, please make sure to set up the environment as described below. |
|
Our training code is built upon the [llama-factory](https://github.com/hiyouga/llama-factory) framework. |
|
|
|
```bash |
|
git clone https://github.com/HongbangYuan/OmniReward.git |
|
conda create -n omnireward python=3.10 |
|
conda activate omnireward |
|
``` |
|
|
|
We recommend using **`torch==2.2.0`** for best compatibility. |
|
|
|
Install PyTorch (choose one based on your CUDA version): |
|
|
|
```bash |
|
# For CUDA 11.8: |
|
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 \ |
|
--index-url https://download.pytorch.org/whl/cu118 |
|
|
|
# For CUDA 12.1: |
|
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 \ |
|
--index-url https://download.pytorch.org/whl/cu121 |
|
``` |
|
|
|
Then install the remaining dependencies: |
|
|
|
```bash |
|
cd OmniReward/OmniReward-Factory |
|
pip install -r requirements.txt |
|
``` |
|
|
|
## π¦ Data Preparation |
|
|
|
Download all required training and evaluation datasets from [OmniRewardData](https://huggingface.co/datasets/jinzhuoran/OmniRewardData) and [OmniRewardBench](https://huggingface.co/datasets/HongbangYuan/OmniRewardBench): |
|
|
|
```bash |
|
cd OmniReward-Factory |
|
bash scripts/download.sh |
|
``` |
|
|
|
## ποΈββοΈ Training Omni-Reward |
|
|
|
To reproduce the training results described in our paper, please navigate to the OmniReward-Factory directory and run the following scripts: |
|
|
|
```bash |
|
cd OmniReward-Factory |
|
bash scripts/train.sh |
|
bash scripts/train_t2t.sh |
|
bash scripts/train_ti2t.sh |
|
bash scripts/train_t2iv.sh |
|
``` |
|
## π Loading and Evaluating Omni-Reward |
|
|
|
You can also directly use our pretrained Omni-Reward for evaluation without retraining. |
|
|
|
The models are publicly available at: |
|
|
|
π https://huggingface.co/jinzhuoran/OmniRewardModel |
|
|
|
```bash |
|
cd OmniReward-Factory |
|
bash scripts/eval_t2t.sh |
|
bash scripts/eval_t2t_tie.sh |
|
bash scripts/eval_ti2t.sh |
|
bash scripts/eval_ti2t_tie.sh |
|
``` |
|
|
|
- `--eval_dataset`: Specifies the evaluation dataset (e.g., `omni_t2t`, `omni_t2i`, `omni_t2v`, etc.). |
|
|
|
- `--eval_tie`: Enables w/ Ties evaluation. |
|
|
|
|