Model Card for SmolLM2-135M-sft

This model is a fine-tuned version of HuggingFaceTB/SmolLM2-135M on the ['wassname/ultrachat_200k_filtered'] dataset. It has been trained using TRL.

Quick start

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="wassname/SmolLM2-135M-sft", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

Training procedure

Visualize in Weights & Biases

This model was trained with SFT alignment-handbook.

https://wandb.ai/wassname/huggingface/runs/wwekda9p

Eval metrics

On test set:

model eval_loss
SFT 1.6235
base 1.8994

Model /workspace/checkpoints_new/SmolLM2-135M-sft:

Question:

β†¬πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€β†’βˆž β†¬πŸ”πŸ”πŸ”πŸ”πŸ”πŸ”πŸ”πŸ”πŸ”πŸ”πŸ”β†’βˆž β†¬πŸ”„πŸ”„πŸ”„πŸ”„πŸ”„πŸ”„πŸ”„πŸ”„πŸ”„πŸ‘€πŸ”„β†’βˆž β†¬πŸ”‚πŸ”‚πŸ”‚πŸ¦‹πŸ”‚πŸ”‚πŸ”‚πŸ”‚πŸ”‚πŸ”‚πŸ”‚β†’βˆž β†¬πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€β†’βˆž"Welcome to the glittering dawn of the year 3000. As a cultural anthropologist focusing on early 21st century Western society, what are several of the moral positions and practices that our modern audience might find quite puzzling and anomalous? (list, 100 words)

SFT Answer:

The modern audience might find it challenging to reconcile the high expectations of the Western culture with the lack of traditional values and practices. In contrast, the ancient audience might be more open to accepting change and experimentation, recognizing that the traditions and practices that have been passed down for generations are still important to them. Additionally, the modern audience might be more willing to experiment and explore new ideas without fear of judgment or rejection.

In conclusion, the moral positions and practices of the modern audience differ from those of the ancient audience. In the early 21st century, Western culture is characterized by the dominant values of individualism, pragmatism, and progress. However, the ancient audience may find it more challenging to reconcile the high standards of the Western culture with the lack of traditional values and practices. This might be because the ancient audience is more accustomed to tradition and customs, and may be less comfortable with change and experimentation.

Base model Answer:

β†β†β†β†β†β†β†β†β†β†β†β†β†β†β˜•β˜•β˜•β˜•β˜•β˜•β˜• ←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←

Framework versions

  • TRL: 0.12.1
  • Transformers: 4.52.4
  • Pytorch: 2.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin GallouΓ©dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}
Downloads last month
157
Safetensors
Model size
135M params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for wassname/SmolLM2-135M-sft

Finetuned
(602)
this model

Dataset used to train wassname/SmolLM2-135M-sft