VideoScore / README.md
hexuan21's picture
Update README.md
0c1ac5e verified
|
raw
history blame
3.56 kB
metadata
license: apache-2.0
datasets:
  - TIGER-Lab/VideoEval
language:
  - en
metrics:
  - accuracy
library_name: transformers
pipeline_tag: visual-question-answering

MantisScore_logo MantisScore

[Paper] | Website | Github | Datasets | Model | Demo

MantisScore

Introduction

  • MantisScore is a video quality evaluation model, taking Mantis-8B-Idefics2 as base-model and trained on VideoEval, a large video evaluation dataset with multi-aspect human scores.

  • MantisScore can reach 75+ Spearman correlation with humans on VideoEval-test, surpassing all the MLLM-prompting methods and feature-based metrics.

  • MantisScore also beat the best baselines on other three benchmarks EvalCrafter, GenAI-Bench and VBench, showing high alignment with human evaluations.

Performance

Evaluation Results on 4 benchmarks.

We test our video evaluation model MantisScore on VideoEval-test, EvalCrafter, GenAI-Bench and VBench. For the first two benchmarks, we take Spearman corrleation between model's output and human ratings averaged among all the evaluation aspects as indicator. For GenAI-Bench and VBench, which include human preference data among two or more videos, we employ the model's output to predict preferences and use pairwise accuracy as the performance indicator.

metric Final Sum Score VideoEval-test EvalCrafter GenAI-Bench VBench
MantisScore
Gemini-1.5-Pro 158.8 22.1 22.9 60.9 52.9
Gemini-1.5-Flash 157.5 20.8 17.3 67.1 52.3
GPT-4o 155.4 23.1 28.7 52.0 51.7
CLIP-sim 126.8 8.9 36.2 34.2 47.4
DINO-sim 121.3 7.5 32.1 38.5 43.3
SSIM-sim 118.0 13.4 26.9 34.1 43.5
CLIP-Score 114.4 -7.2 21.7 45.0 54.9
LLaVA-1.5-7B 108.3 8.5 10.5 49.9 39.4
LLaVA-1.6-7B 93.3 -3.1 13.2 44.5 38.7
X-CLIP-Score 92.9 -1.9 13.3 41.4 40.1
PIQE 78.3 -10.1 -1.2 34.5 55.1
BRISQUE 75.9 -20.3 3.9 38.5 53.7
SSIM-dyn 42.5 -5.5 -17.0 28.4 36.5
MES-dyn 36.7 -12.9 -26.4 31.4 44.5

Usage

Installation

pip install git+https://github.com/TIGER-AI-Lab/MantisScore.git

Inference

Training

MantisScore is trained on

Evaluation

Citation