---
license: mit
datasets:
- IlyaGusev/gazeta
- csebuetnlp/xlsum
language:
- ru
metrics:
- bertscore
- bleu
- rouge
- chrf
- meteor
tags:
- text2text-generation
- summarization
- russian
- t5
base_model:
- ai-forever/ruT5-base
---

# ruT5-base Model for Abstractive Summarization of Russian News

This is the `ai-forever/ruT5-base` model, fine-tuned for the task of abstractive summarization of news texts in Russian.

## Model Description

The model is based on the T5 (Text-to-Text Transfer Transformer) architecture – an encoder-decoder transformer. The original pre-trained model `ai-forever/ruT5-base` was fine-tuned on a combined dataset consisting of Russian news articles from the Gazeta datasets and the Russian part of XLSum.

Details of the training process and results analysis can be found in the [GitHub repository](https://github.com/XristoLeonov/ru-text-summarization).

**Fine-tuning Parameters (key):**
*   **Base model:** `ai-forever/ruT5-base`
*   **Dataset:** Combined Gazeta + XLSum (Russian part), ~32k "article-summary" pairs after filtering.
*   **Max input length:** 512 tokens
*   **Max output length (summary):** 64 tokens

## How to Use

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

MODEL_NAME = "Xristo/ruT5-base-rus-news-sum"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)

article_text = """..."""

input_ids = tokenizer(
    [article_text],
    max_length=512,
    padding="max_length",
    truncation=True,
    return_tensors="pt",
)["input_ids"]

output_ids = model.generate(
    input_ids=input_ids,
    max_length=64,
    no_repeat_ngram_size=3,
    num_beams=4,
    early_stopping=True
)

summary = tokenizer.decode(output_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)

print("Generated summary:")
print(summary)
```

## Evaluation Results (Metrics)

Evaluation was performed on a held-out test set (10% of the filtered Gazeta+XLSum dataset).
The best checkpoint (20th epoch) showed the following results:

| Model         | ROUGE-1 F1 | ROUGE-2 F1 | ROUGE-L F1 | METEOR  | BERTScore F1 | CHRF++  | BLEU    |
|----------------|------------|------------|------------|---------|--------------|---------|---------|
| ruT5-base      | 30.73      | 15.22      | 27.94      | 29.42   | 78.36        | 40.06   | 10.91   |

**Comparison with baseline models:**
When compared to models `IlyaGusev/mbart_ru_sum_gazeta` (max summary length 200 tokens, R1=32.4, R2=14.3, RL=28.0, METEOR=26.4) and `csebuetnlp/mT5_multilingual_XLSum` (max summary length 84 tokens, R1=32.2, R2=13.6, RL=26.2 for RU XL-Sum), this fine-tuned `ruT5-base` model (with a max summary length of 64 tokens) demonstrates competitive results, surpassing them in ROUGE-2 and METEOR, which indicates a high information density of the generated summaries.