open-source-ai-t5-liar-lens

This is a fine-tuned version of t5-small, trained to classify short claims and quotes in the style of the LIAR dataset. It was developed as part of the Open Source AI book project by Jerry Cuomo and José De Jesús to demonstrate fine-tuning and evaluation techniques with a light, satirical touch.

What's Different?

To test T5’s summarization-as-classification capabilities, we augmented the LIAR dataset with 225 synthetic examples drawn from our own book. These were written to echo the style of political claims—confident, compressible, and occasionally absurd. It’s a tongue-in-cheek benchmark, but a useful one. It lets us explore how a summarization model handles short-form reasoning, fake-ish news, and the delightful blur between fact and fiction in machine learning writing.

So while the original LIAR dataset supplies factual claims from political discourse, our additions bring in quotes that parody open-source mantras, AI hype cycles, and technical one-liners. The result is a model that scores both campaign promises and keynote punchlines with equal scrutiny.

Task Format

This model treats classification as a text-to-text generation task. Each input is a short claim or quote, and the model responds with one of six factuality labels, generated directly as a lowercase string:

  • pants-fire
  • false
  • barely-true
  • half-true
  • mostly-true
  • true

The input format uses a summarization-style prefix to frame the task:

Example Input:

summarize: Python is the fastest programming language available.

Example Output:

half-true

This response reflects the model’s ability to evaluate short-form claims with nuance, producing a graded label based on its understanding of truthfulness.

Training Details

  • Base model: t5-small
  • Datasets:
  • Epochs: 5
  • Batch size: 4
  • Max input length: 128 tokens
  • Platform: Google Colab
  • Checkpoint: open-source-ai-t5-liar-lens

Intended Use

This model is designed for educational use, particularly in demonstrating:

  • Lightweight fine-tuning with Hugging Face Transformers
  • Classification-as-generation using T5
  • Transparent model publishing and benchmarking
  • The fuzziness of truth in machine learning culture

It is not intended for production-grade fact-checking or regulatory enforcement.

Example Usage

### Example Usage

from transformers import T5ForConditionalGeneration, T5Tokenizer

# Load the fine-tuned model and tokenizer
model = T5ForConditionalGeneration.from_pretrained(
    "gcuomo/open-source-ai-t5-liar-lens"
)
tokenizer = T5Tokenizer.from_pretrained(
    "gcuomo/open-source-ai-t5-liar-lens"
)

# Prepare input
statement = "Blockchain guarantees ethical outcomes in all AI systems."
prompt = f"summarize: {statement}"
inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True, max_length=128)

# Generate prediction
output = model.generate(**inputs, max_new_tokens=8)
prediction = tokenizer.decode(output[0], skip_special_tokens=True).strip().lower()

# Print result
print("Predicted label:", prediction)

Citation

If you reference this model or its training methodology, please cite:

Cuomo, J. & De Jesús, J. (2025). Open Source AI. No Starch Press.
Training datasets:

Downloads last month
116
Safetensors
Model size
60.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gcuomo/open-source-ai-t5-liar-lens

Base model

google-t5/t5-small
Finetuned
(1900)
this model

Dataset used to train gcuomo/open-source-ai-t5-liar-lens

Space using gcuomo/open-source-ai-t5-liar-lens 1