distilroberta-inspirational-quotes

Model Description

This model is a fine-tuned version of distilroberta-base specialized to complete positive, inspirational quotes. It was trained on the Abirate/english_quotes dataset. The project demonstrates transfer learning by taking a general-purpose model and specializing it for a niche task, resulting in a model that outperforms its base version on that task. It achieves the following results on the evaluation set:

  • Loss: 1.5843

Project Goal

The goal of this project was to fine-tune the distilroberta-base model to specialize in completing inspirational quotes, making it more contextually relevant and accurate for this specific task compared to the base model.

How to Use

You can use this model directly with a pipeline for fill-mask tasks:

from transformers import pipeline

# Load the fine-tuned model from the Hub
fine_tuned_pipe = pipeline("fill-mask", model="Boinbo/distilroberta-inspirational-quotes")

# Define a prompt
prompt = f"The only way to do great work is to {fine_tuned_pipe.tokenizer.mask_token} what you do."

# Get predictions
print(fine_tuned_pipe(prompt))
# Expected output might include words like "love", "enjoy", etc.

Performance Comparison

The fine-tuned model provides more contextually relevant predictions for inspirational quotes compared to the original distilroberta-base model.

Prompt Base Model (distilroberta-base) Output Fine-Tuned Model Output
The secret to getting ahead is getting <mask> 1. ahead (0.19)
2. married (0.13)
3. started (0.04)
4. drunk (0.03)
1. ahead (0.55)
2. started (0.06)
3. prepared (0.02)
4. ready (0.02)

As shown, the fine-tuned model correctly identifies "started" as a more probable completion than the base model and removes irrelevant suggestions like "married" and "drunk".

Technologies Used

Cloud Platforms & Services

  • Google Colab: The cloud-based Jupyter notebook environment where the code was written and executed, providing the necessary computational resources (including GPU acceleration).
  • Hugging Face Hub: Central platform for:
    • Dataset Hosting: Sourced the Abirate/english_quotes dataset.
    • Base Model Hosting: Downloaded the distilroberta-base model.
    • Fine-Tuned Model Hosting: Saved and shared the final distilroberta-inspirational-quotes model.

Development Environment & Hardware

  • Python: The core programming language for the project.
  • Jupyter Notebook: The interactive environment (.ipynb file) used for writing, documenting, and running the code in cells.
  • NVIDIA T4 GPU: The specific GPU provided by Google Colab that accelerated the model training process.

Core Machine Learning Libraries & Frameworks

  • PyTorch: The underlying deep learning framework that the transformers library is built upon, handling tensor operations and gradient calculations.
  • Hugging Face transformers: Main library for:
    • AutoModelForMaskedLM: Loading the distilroberta-base model.
    • AutoTokenizer: Loading the correct tokenizer for the model.
    • Trainer and TrainingArguments: Managing the fine-tuning loop.
    • pipeline: For easy inference with the final model.
  • Hugging Face datasets: For downloading and preparing the dataset from the Hub.
  • Hugging Face accelerate: Optimizes PyTorch training code for the specific hardware (T4 GPU) in use.
  • bitsandbytes: Used for memory-efficient training and quantization, integrated into the Trainer workflow.

AI Model & Key Concepts

  • distilroberta-base: The pre-trained language model serving as the foundation for fine-tuning.
  • Transfer Learning: Adapting a model trained on a large, general dataset to a new, specialized task.
  • Fine-Tuning: Continuing the training of a pre-trained model on a new, smaller dataset.
  • Masked Language Modeling (MLM): The training objective where the model learns to predict randomly hidden (masked) words in a sentence, enabling the model to learn the patterns of inspirational quotes.

Training and Evaluation

  • Dataset: The model was fine-tuned on the train split of the Abirate/english_quotes dataset.
  • Process: The process involved tokenizing the text data, then using a DataCollatorForLanguageModeling to randomly mask tokens. The model was then trained using the Trainer class from the transformers library.

Training Hyperparameters

The following hyperparameters were used during the final successful training run:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

Training Results

Training Loss Epoch Step Validation Loss
1.7341 1.0 157 1.6612
1.7427 2.0 314 1.5811
1.8748 3.0 471 1.5843

Challenges & Lessons Learned

A key challenge in this project was overcoming initial poor results. The first fine-tuned versions of the model produced garbage output. The investigation revealed three main issues:

  1. Hyperparameter Tuning: The initial learning_rate was too high and the num_train_epochs was excessive. This caused the model's pre-trained weights to be destroyed, leading to nonsensical output. Adjusting the learning rate to 2e-05 and reducing epochs to 3 was critical for a stable training process.
  2. Model Loading and Caching: After fixing the hyperparameters, the model still produced bad output in the testing environment. This was due to the pipeline function loading an old, broken version of the model from a local cache in Google Colab. The problem was solved by modifying the testing script to load the model directly from the local output directory (distilroberta-inspirational-quotes/) instead of from the Hugging Face Hub ID, ensuring the most recently trained model was used.
  3. Proper Data Selection: The dataset selected for this project did not include a separate evaluation split. While this was acceptable for a small-scale demonstration, it is not recommended for larger or production-level projects, where having distinct training, validation, and test splits is crucial for robust model evaluation and generalization.

This experience highlighted the importance of careful hyperparameter selection for transfer learning, the need to be aware of underlying mechanisms like caching during the development and testing cycle, and the significance of proper dataset selection and splitting.

Framework Versions

  • Transformers 4.53.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.6.0
  • Tokenizers 0.21.2
Downloads last month
15
Safetensors
Model size
82.2M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Boinbo/distilroberta-inspirational-quotes

Finetuned
(651)
this model

Dataset used to train Boinbo/distilroberta-inspirational-quotes