open-source-ai-t5-liar-lens

This is a fine-tuned version of t5-small, trained to classify short claims and quotes in the style of the LIAR dataset. It was developed as part of the Open Source AI book project by Jerry Cuomo and José De Jesús to demonstrate fine-tuning and evaluation techniques with a light, satirical touch.

What's Different?

To test T5’s summarization-as-classification capabilities, we augmented the LIAR dataset with 225 synthetic examples drawn from our own book. These were written to echo the style of political claims—confident, compressible, and occasionally absurd. It’s a tongue-in-cheek benchmark, but a useful one. It lets us explore how a summarization model handles short-form reasoning, fake-ish news, and the delightful blur between fact and fiction in machine learning writing.

So while the original LIAR dataset supplies factual claims from political discourse, our additions bring in quotes that parody open-source mantras, AI hype cycles, and technical one-liners. The result is a model that scores both campaign promises and keynote punchlines with equal scrutiny.

Task Format

This model treats classification as a text-to-text generation task. Each input is a short claim or quote, and the model responds with one of six factuality labels, generated directly as a lowercase string:

pants-fire
false
barely-true
half-true
mostly-true
true

The input format uses a summarization-style prefix to frame the task:

Example Input:

summarize: Python is the fastest programming language available.

Example Output:

half-true

This response reflects the model’s ability to evaluate short-form claims with nuance, producing a graded label based on its understanding of truthfulness.

Training Details

Base model: t5-small
Datasets:
- LIAR
- Open Source AI LIAR-style CSV
Epochs: 5
Batch size: 4
Max input length: 128 tokens
Platform: Google Colab
Checkpoint: open-source-ai-t5-liar-lens

Intended Use

This model is designed for educational use, particularly in demonstrating:

Lightweight fine-tuning with Hugging Face Transformers
Classification-as-generation using T5
Transparent model publishing and benchmarking
The fuzziness of truth in machine learning culture

It is not intended for production-grade fact-checking or regulatory enforcement.

Example Usage

### Example Usage

from transformers import T5ForConditionalGeneration, T5Tokenizer

# Load the fine-tuned model and tokenizer
model = T5ForConditionalGeneration.from_pretrained(
    "gcuomo/open-source-ai-t5-liar-lens"
)
tokenizer = T5Tokenizer.from_pretrained(
    "gcuomo/open-source-ai-t5-liar-lens"
)

# Prepare input
statement = "Blockchain guarantees ethical outcomes in all AI systems."
prompt = f"summarize: {statement}"
inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True, max_length=128)

# Generate prediction
output = model.generate(**inputs, max_new_tokens=8)
prediction = tokenizer.decode(output[0], skip_special_tokens=True).strip().lower()

# Print result
print("Predicted label:", prediction)

Citation

If you reference this model or its training methodology, please cite:

Cuomo, J. & De Jesús, J. (2025). Open Source AI. No Starch Press.
Training datasets:

LIAR dataset

Open Source AI LIAR-style dataset

gcuomo
/

open-source-ai-t5-liar-lens