File size: 2,476 Bytes
4007606 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
---
license: apache-2.0
tags:
- t5
- text-classification
- open-source-ai
- liar-dataset
- educational
- demo
datasets:
- liar
language:
- en
base_model:
- t5-small
pipeline_tag: text2text-generation
---
# open-source-ai-t5-liar-lens
This is a fine-tuned version of [`t5-small`](https://huggingface.co/t5-small), adapted for classifying political claims using the [LIAR dataset](https://huggingface.co/datasets/liar). The model was developed as part of the *Open Source AI* book project by Jerry Cuomo and José De Jesús, and is intended as a demonstration of lightweight MLOps practices.
Given a political claim as input, the model predicts one of six factuality labels from the LIAR dataset:
- `true`
- `mostly-true`
- `half-true`
- `barely-true`
- `false`
- `pants-fire`
This task is framed as a text-to-text problem using a summarization-style prompt:
Input: veracity: The unemployment rate has dropped to 4.1%
Target: mostly-true
The model is not intended for production use. It was fine-tuned on a small subset of the LIAR dataset for demonstration purposes in the context of reproducible, transparent model development. It is best used to illustrate the concepts of fine-tuning, structured logging, checkpointing, and publishing open models.
## Training Details
- **Base model**: `t5-small`
- **Dataset**: LIAR (subset)
- **Epochs**: 1
- **Batch size**: 4
- **Max input length**: 128 tokens
- **Hardware**: Google Colab
- **Checkpoint name**: `open-source-ai-t5-liar-lens`
## Intended Use
This model is provided for educational and illustrative use. It demonstrates how to:
- Fine-tune a T5 model on a classification task
- Log and version experiments
- Save and publish models to Hugging Face Hub
## Quick Example
```python
from transformers import T5ForConditionalGeneration, T5Tokenizer
model = T5ForConditionalGeneration.from_pretrained(
"gcuomo/open-source-ai-t5-liar-lens"
)
tokenizer = T5Tokenizer.from_pretrained(
"gcuomo/open-source-ai-t5-liar-lens"
)
input_text = "veracity: The president signed the bill into law last year"
inputs = tokenizer(input_text, return_tensors="pt")
output = model.generate(**inputs)
print(tokenizer.decode(output[0], skip_special_tokens=True))
## Citation
If you reference this model or its training approach, please cite:
> Cuomo, J. & De Jesús, J. (2025). *Open Source AI*. No Starch Press.
> Trained on the LIAR dataset: [https://huggingface.co/datasets/liar](https://huggingface.co/datasets/liar) |