---
license: apache-2.0
tags:
  - summarization
  - t5
  - distilled
  - sequence-to-sequence
language: en
datasets:
  - cnn_dailymail
pipeline_tag: summarization
---

# Distilled T5-small Summarizer

This model is a **distilled version of T5-small**, fine-tuned for abstractive text summarization. It was distilled from a larger teacher model using Hugging Face's `transformers` library.

## 📋 Model Details

- **Architecture**: T5-small
- **Distilled from**: (optional - e.g., `t5-base` or `t5-large` if applicable)
- **Task**: Abstractive text summarization
- **Training data**: CNN/DailyMail dataset (or specify if different)
- **Evaluation metrics**: ROUGE (1/2/L)

## 🚀 Usage

Here’s how you can use this model in Python:

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained("AhilanPonnusamy/distilled-t5small-summarizer")
tokenizer = AutoTokenizer.from_pretrained("AhilanPonnusamy/distilled-t5small-summarizer")

text = "Reading manager Steve Clarke insists the FA Cup needs protecting after some dubious scheduling decisions. Earlier in the competition the Third Round ties were split over five days due to New Years Day Premier League matches and to accommodate televised games. Reading had to play their FA Cup replay against Bradford in the last round on a Monday when they had played a Championship match two days previously. Steve Clarke wants scheduling of matches in England to improve so the FA Cup can be preserved . Should they progress to the final, that will be contested on May 30 leaving Clarke’s side almost a month without games when the Championship season ends on May 2. The massive Premier League match between Chelsea and Manchester United is also scheduled to be televised at the same time as their semi-final against Arsenal at Wembley on Saturday. Clarke claimed he ‘couldn’t care less’ about the conflicting match, but added: ‘I thought it was a shame in the last round when we had to play on a Monday night after playing on a Saturday. Reading beat Bradford in the last round but face a much tougher task when they face Arsenal at Wembley . 'There are things that we should do to protect this great competition. It should be special. ‘When we beat Arsenal, we have to wait four weeks after our last league game to play the cup final, this is also not correct. 'I probably need to go on holiday for two of them and then bring the team back in. It's a long break. ‘If we get to the final, what are we going to do from May 2 to May 30? What do we do? Everyone else has played, so we won't be playing games. 'It'd be a great puzzle to have though. Let’s talk about it on Saturday night.’ Reading defender Alex Pearce revealed the players are waiting until after the match on Saturday before booking any time off in May in case they beat Arsenal. ‘Holidays are off until now, you can’t book anything, you’ve got to just see where you are and it would be great,’ he said. ‘We’re all committed and dedicated to getting to this final.’ Arsene Wenger's side are in formidable form and beating them will be a tough ask for the Royals."
inputs = tokenizer("summarize: " + text, return_tensors="pt")
summary_ids = model.generate(**inputs)
print(tokenizer.decode(summary_ids[0], skip_special_tokens=True))
```

## 📊 Evaluation

| Metric   | Score   |
|----------|----------|
| ROUGE-1  | 38.27    |
| ROUGE-2  | 16.33    |
| ROUGE-L  | 26.88    |

## 🧭 Intended Use

This model is intended for **summarization tasks in English**. For example:
- Summarizing news articles
- Creating concise summaries of long texts

## 📦 Files

| File                    | Description                                |
|-------------------------|--------------------------------------------|
| config.json             | Model architecture configuration           |
| generation_config.json  | Generation hyperparameters (optional)      |
| model.safetensors       | Model weights in safetensors format        |
| pytorch_model.bin       | (to be added) Model weights in PyTorch format |
| special_tokens_map.json | Tokenizer special tokens map               |
| spiece.model            | SentencePiece tokenizer model              |
| tokenizer.json          | Tokenizer vocabulary (JSON format)         |
| tokenizer_config.json   | Tokenizer configuration                    |

## 📜 License

This model is distributed under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0).

## ✨ Acknowledgements

- Built with 🤗 Hugging Face Transformers
- Fine-tuned and distilled by [Ahilan Ponnusamy](https://huggingface.co/AhilanPonnusamy)