Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
π€ Benchmark | π€ Dataset | π€ Model | π Homepage
π§© Overview
OmniRewardModel is our pretrained discriminative reward model designed to handle omni-modal tasks (e.g., text, image, video) and free-form human preferences.
It is built upon the open-source base model MiniCPM-o-2_6, with an additional value head appended to produce scalar reward scores.
The model supports fine-grained scoring across various tasks and modalities, and can be seamlessly loaded via Hugging Face Hub.
π οΈ Environment Setup
To reproduce the training process in our paper, please make sure to set up the environment as described below. Our training code is built upon the llama-factory framework.
git clone https://github.com/HongbangYuan/OmniReward.git
conda create -n omnireward python=3.10
conda activate omnireward
We recommend using torch==2.2.0
for best compatibility.
Install PyTorch (choose one based on your CUDA version):
# For CUDA 11.8:
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 \
--index-url https://download.pytorch.org/whl/cu118
# For CUDA 12.1:
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 \
--index-url https://download.pytorch.org/whl/cu121
Then install the remaining dependencies:
cd OmniReward/OmniReward-Factory
pip install -r requirements.txt
π¦ Data Preparation
Download all required training and evaluation datasets from OmniRewardData and OmniRewardBench:
cd OmniReward-Factory
bash scripts/download.sh
ποΈββοΈ Training Omni-Reward
To reproduce the training results described in our paper, please navigate to the OmniReward-Factory directory and run the following scripts:
cd OmniReward-Factory
bash scripts/train.sh
bash scripts/train_t2t.sh
bash scripts/train_ti2t.sh
bash scripts/train_t2iv.sh
π Loading and Evaluating Omni-Reward
You can also directly use our pretrained Omni-Reward for evaluation without retraining.
The models are publicly available at:
π https://huggingface.co/jinzhuoran/OmniRewardModel
cd OmniReward-Factory
bash scripts/eval_t2t.sh
bash scripts/eval_t2t_tie.sh
bash scripts/eval_ti2t.sh
bash scripts/eval_ti2t_tie.sh
--eval_dataset
: Specifies the evaluation dataset (e.g.,omni_t2t
,omni_t2i
,omni_t2v
, etc.).--eval_tie
: Enables w/ Ties evaluation.
- Downloads last month
- 11
Model tree for jinzhuoran/OmniRewardModel
Base model
openbmb/MiniCPM-o-2_6