gcuomo's picture
Update README.md
4007606 verified
|
raw
history blame
2.48 kB
metadata
license: apache-2.0
tags:
  - t5
  - text-classification
  - open-source-ai
  - liar-dataset
  - educational
  - demo
datasets:
  - liar
language:
  - en
base_model:
  - t5-small
pipeline_tag: text2text-generation

open-source-ai-t5-liar-lens

This is a fine-tuned version of t5-small, adapted for classifying political claims using the LIAR dataset. The model was developed as part of the Open Source AI book project by Jerry Cuomo and José De Jesús, and is intended as a demonstration of lightweight MLOps practices.

Given a political claim as input, the model predicts one of six factuality labels from the LIAR dataset:

  • true
  • mostly-true
  • half-true
  • barely-true
  • false
  • pants-fire

This task is framed as a text-to-text problem using a summarization-style prompt:

Input: veracity: The unemployment rate has dropped to 4.1% Target: mostly-true

The model is not intended for production use. It was fine-tuned on a small subset of the LIAR dataset for demonstration purposes in the context of reproducible, transparent model development. It is best used to illustrate the concepts of fine-tuning, structured logging, checkpointing, and publishing open models.

Training Details

  • Base model: t5-small
  • Dataset: LIAR (subset)
  • Epochs: 1
  • Batch size: 4
  • Max input length: 128 tokens
  • Hardware: Google Colab
  • Checkpoint name: open-source-ai-t5-liar-lens

Intended Use

This model is provided for educational and illustrative use. It demonstrates how to:

  • Fine-tune a T5 model on a classification task
  • Log and version experiments
  • Save and publish models to Hugging Face Hub

Quick Example

from transformers import T5ForConditionalGeneration, T5Tokenizer

model = T5ForConditionalGeneration.from_pretrained(
    "gcuomo/open-source-ai-t5-liar-lens"
)
tokenizer = T5Tokenizer.from_pretrained(
    "gcuomo/open-source-ai-t5-liar-lens"
)

input_text = "veracity: The president signed the bill into law last year"
inputs = tokenizer(input_text, return_tensors="pt")
output = model.generate(**inputs)
print(tokenizer.decode(output[0], skip_special_tokens=True))

## Citation

If you reference this model or its training approach, please cite:

> Cuomo, J. & De Jesús, J. (2025). *Open Source AI*. No Starch Press.  
> Trained on the LIAR dataset: [https://huggingface.co/datasets/liar](https://huggingface.co/datasets/liar)