File size: 2,476 Bytes
4007606
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
license: apache-2.0
tags:
- t5
- text-classification
- open-source-ai
- liar-dataset
- educational
- demo
datasets:
- liar
language:
- en
base_model:
- t5-small
pipeline_tag: text2text-generation
---

# open-source-ai-t5-liar-lens

This is a fine-tuned version of [`t5-small`](https://huggingface.co/t5-small), adapted for classifying political claims using the [LIAR dataset](https://huggingface.co/datasets/liar). The model was developed as part of the *Open Source AI* book project by Jerry Cuomo and José De Jesús, and is intended as a demonstration of lightweight MLOps practices.

Given a political claim as input, the model predicts one of six factuality labels from the LIAR dataset:

- `true`
- `mostly-true`
- `half-true`
- `barely-true`
- `false`
- `pants-fire`

This task is framed as a text-to-text problem using a summarization-style prompt:

Input: veracity: The unemployment rate has dropped to 4.1%
Target: mostly-true


The model is not intended for production use. It was fine-tuned on a small subset of the LIAR dataset for demonstration purposes in the context of reproducible, transparent model development. It is best used to illustrate the concepts of fine-tuning, structured logging, checkpointing, and publishing open models.

## Training Details

- **Base model**: `t5-small`
- **Dataset**: LIAR (subset)
- **Epochs**: 1
- **Batch size**: 4
- **Max input length**: 128 tokens
- **Hardware**: Google Colab
- **Checkpoint name**: `open-source-ai-t5-liar-lens`

## Intended Use

This model is provided for educational and illustrative use. It demonstrates how to:
- Fine-tune a T5 model on a classification task
- Log and version experiments
- Save and publish models to Hugging Face Hub

## Quick Example

```python
from transformers import T5ForConditionalGeneration, T5Tokenizer

model = T5ForConditionalGeneration.from_pretrained(
    "gcuomo/open-source-ai-t5-liar-lens"
)
tokenizer = T5Tokenizer.from_pretrained(
    "gcuomo/open-source-ai-t5-liar-lens"
)

input_text = "veracity: The president signed the bill into law last year"
inputs = tokenizer(input_text, return_tensors="pt")
output = model.generate(**inputs)
print(tokenizer.decode(output[0], skip_special_tokens=True))

## Citation

If you reference this model or its training approach, please cite:

> Cuomo, J. & De Jesús, J. (2025). *Open Source AI*. No Starch Press.  
> Trained on the LIAR dataset: [https://huggingface.co/datasets/liar](https://huggingface.co/datasets/liar)