File size: 1,394 Bytes
a9dabf7
90939c2
 
 
 
 
 
 
 
 
 
 
 
a9dabf7
90939c2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
---

license: other
license_name: cogvlm2
license_link: https://huggingface.co/THUDM/cogvlm2-llama3-chat-19B/blob/main/LICENS

language:
- ens
pipeline_tag: text-generation
tags:
- chat
- cogvlm2

inference: false
---

# VisionReward-Image

## Introduction
We present VisionReward, a general strategy to aligning visual generation models——both image and video generation——with human preferences through a fine-grainedand multi-dimensional framework. We decompose human preferences in images and videos into multiple dimensions,each represented by a series of judgment questions, linearly weighted and summed to an interpretable and accuratescore. To address the challenges of video quality assess-ment, we systematically analyze various dynamic features of videos, which helps VisionReward surpass VideoScore by 17.2% and achieve top performance for video preference prediction.
Here, we present the model of VisionReward-Image.

## Merging and Extracting Checkpoint Files
Use the following command to merge the split files into a single `.tar` file and then extract it into the specified directory:

```sh

cat ckpts/split_part_* > ckpts/visionreward_image.tar

tar -xvf ckpts/visionreward_image.tar

```

## Using this model
You can quickly install the Python package dependencies and run model inference in our [github](https://github.com/THUDM/VisionReward).