Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

πŸ€— Benchmark | πŸ€— Dataset | πŸ€— Model | 🏠 Homepage

🧩 Overview

OmniRewardModel is our pretrained discriminative reward model designed to handle omni-modal tasks (e.g., text, image, video) and free-form human preferences.

It is built upon the open-source base model MiniCPM-o-2_6, with an additional value head appended to produce scalar reward scores.

The model supports fine-grained scoring across various tasks and modalities, and can be seamlessly loaded via Hugging Face Hub.


πŸ› οΈ Environment Setup

To reproduce the training process in our paper, please make sure to set up the environment as described below. Our training code is built upon the llama-factory framework.

git clone https://github.com/HongbangYuan/OmniReward.git
conda create -n omnireward python=3.10
conda activate omnireward

We recommend using torch==2.2.0 for best compatibility.

Install PyTorch (choose one based on your CUDA version):

# For CUDA 11.8:
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 \
    --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1:
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 \
    --index-url https://download.pytorch.org/whl/cu121

Then install the remaining dependencies:

cd OmniReward/OmniReward-Factory
pip install -r requirements.txt

πŸ“¦ Data Preparation

Download all required training and evaluation datasets from OmniRewardData and OmniRewardBench:

cd OmniReward-Factory
bash scripts/download.sh

πŸ‹οΈβ€β™€οΈ Training Omni-Reward

To reproduce the training results described in our paper, please navigate to the OmniReward-Factory directory and run the following scripts:

cd OmniReward-Factory
bash scripts/train.sh
bash scripts/train_t2t.sh
bash scripts/train_ti2t.sh
bash scripts/train_t2iv.sh

πŸ“ˆ Loading and Evaluating Omni-Reward

You can also directly use our pretrained Omni-Reward for evaluation without retraining.

The models are publicly available at:

πŸ‘‰ https://huggingface.co/jinzhuoran/OmniRewardModel

cd OmniReward-Factory
bash scripts/eval_t2t.sh
bash scripts/eval_t2t_tie.sh
bash scripts/eval_ti2t.sh
bash scripts/eval_ti2t_tie.sh
  • --eval_dataset: Specifies the evaluation dataset (e.g., omni_t2t, omni_t2i, omni_t2v, etc.).

  • --eval_tie: Enables w/ Ties evaluation.

Downloads last month
11
Safetensors
Model size
8.42B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jinzhuoran/OmniRewardModel

Finetuned
(6)
this model

Dataset used to train jinzhuoran/OmniRewardModel

Collection including jinzhuoran/OmniRewardModel