Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

🤗 Benchmark | 🤗 Dataset | 🤗 Model | 🏠 Homepage

🧩 Overview

OmniRewardModel is our pretrained discriminative reward model designed to handle omni-modal tasks (e.g., text, image, video) and free-form human preferences.

It is built upon the open-source base model MiniCPM-o-2_6, with an additional value head appended to produce scalar reward scores.

The model supports fine-grained scoring across various tasks and modalities, and can be seamlessly loaded via Hugging Face Hub.

🛠️ Environment Setup

To reproduce the training process in our paper, please make sure to set up the environment as described below. Our training code is built upon the llama-factory framework.

git clone https://github.com/HongbangYuan/OmniReward.git
conda create -n omnireward python=3.10
conda activate omnireward

We recommend using torch==2.2.0 for best compatibility.

Install PyTorch (choose one based on your CUDA version):

# For CUDA 11.8:
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 \
    --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1:
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 \
    --index-url https://download.pytorch.org/whl/cu121

Then install the remaining dependencies:

cd OmniReward/OmniReward-Factory
pip install -r requirements.txt

📦 Data Preparation

Download all required training and evaluation datasets from OmniRewardData and OmniRewardBench:

cd OmniReward-Factory
bash scripts/download.sh

🏋️‍♀️ Training Omni-Reward

To reproduce the training results described in our paper, please navigate to the OmniReward-Factory directory and run the following scripts:

cd OmniReward-Factory
bash scripts/train.sh
bash scripts/train_t2t.sh
bash scripts/train_ti2t.sh
bash scripts/train_t2iv.sh

📈 Loading and Evaluating Omni-Reward

You can also directly use our pretrained Omni-Reward for evaluation without retraining.

The models are publicly available at:

👉 https://huggingface.co/jinzhuoran/OmniRewardModel

cd OmniReward-Factory
bash scripts/eval_t2t.sh
bash scripts/eval_t2t_tie.sh
bash scripts/eval_ti2t.sh
bash scripts/eval_ti2t_tie.sh

--eval_dataset: Specifies the evaluation dataset (e.g., omni_t2t, omni_t2i, omni_t2v, etc.).
--eval_tie: Enables w/ Ties evaluation.

jinzhuoran
/

OmniRewardModel

Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

🧩 Overview

🛠️ Environment Setup

📦 Data Preparation

🏋️‍♀️ Training Omni-Reward

📈 Loading and Evaluating Omni-Reward

Model tree for jinzhuoran/OmniRewardModel

Dataset used to train jinzhuoran/OmniRewardModel

Collection including jinzhuoran/OmniRewardModel

OmniReward