ru-summary-quality-metric
This model is a fine-tuned version of ai-forever/ruT5-large
, was trained for binary quality assessment of Russian summaries when paired with their original texts.
Important: model uses a non-standard approach, adapting a Seq2Seq model for a binary classification task. It was trained to predict a specific token as the target sequence. This approach directly follows the methodology used by the authors of the original SEAHORSE paper.
Data and Training Metric
The model was fine-tuned on SEAHORSE dataset. SEAHORSE is a multilingual dataset for summarization evaluation with human annotations across 6 different quality metrics (Q1-Q6).
This specific model focuses on Q6 Conciseness metric. According to SEAHORSE paper authors, Q6 is considered one of the most high-level and challenging quality metrics.
- Training Data:
ru
anden
subsets of training split, filtered forconciseness
labels. - Evaluation Data: only
ru
subset of validation and test splits.
Evaluation Results
Test set | Pearson Correlation | ROC AUC |
---|---|---|
All | 0.479 | 0.792 |
≥ 20 summary words | 0.459 | 0.781 |
Usage
Input format: "текст:\n {} саммари:\n {}"
Target sequences:
ZERO_TOKEN='▁0'
for label 0 (not concise)ONE_TOKEN='▁1'
for label 1 (concise)
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
import numpy as np
model_name = "xendalm/ru-summary-quality-metric"
tokenizer = AutoTokenizer.from_pretrained(model_name)
zero_token_id = tokenizer('▁0', add_special_tokens=False).input_ids[0]
one_token_id = tokenizer('▁1', add_special_tokens=False).input_ids[0]
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
device = "cuda"
model.to(device)
model.eval()
def predict_conciseness_score(text, summary, tokenizer, model, device, zero_token_id, one_token_id):
input_text = f"текст:\n {text} саммари:\n {summary}"
inputs = tokenizer(input_text, return_tensors="pt", max_length=2048, truncation=True, padding=True)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=1,
num_beams=1,
do_sample=False,
return_dict_in_generate=True,
output_scores=True
)
first_token_logits = outputs.scores[0].squeeze(0)
logit_0 = first_token_logits[zero_token_id]
logit_1 = first_token_logits[one_token_id]
probability_of_one = torch.sigmoid(logit_1 - logit_0).item()
return probability_of_one
Links
- Base Model: ai-forever/ruT5-large
- SEAHORSE Dataset: hgissbkh/seahorse
- SEAHORSE Paper: SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
- Downloads last month
- 91
Model tree for xendalm/ru-summary-quality-metric
Base model
ai-forever/ruT5-large