---
base_model: HuggingFaceTB/SmolLM2-135M
datasets:
- wassname/ultrachat_200k_filtered
library_name: transformers
model_name: SmolLM2-135M-sft
tags:
- generated_from_trainer
- alignment-handbook
licence: license
---

# Model Card for SmolLM2-135M-sft

This model is a fine-tuned version of [HuggingFaceTB/SmolLM2-135M](https://huggingface.co/HuggingFaceTB/SmolLM2-135M) on the [['wassname/ultrachat_200k_filtered']](https://huggingface.co/datasets/['wassname/ultrachat_200k_filtered']) dataset.
It has been trained using [TRL](https://github.com/huggingface/trl).

## Quick start

```python
from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="wassname/SmolLM2-135M-sft", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
```

## Training procedure

[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/wassname/huggingface/runs/e18wzya7)

This model was trained with SFT alignment-handbook.

https://wandb.ai/wassname/huggingface/runs/wwekda9p
  
### Eval metrics

On test set:

| model | eval_loss |
| ------|-----------|
| SFT   |  1.6235   |
| base  |  1.8994   |

## Model `/workspace/checkpoints_new/SmolLM2-135M-sft`:
### Question:
↬🔀🔀🔀🔀🔀🔀🔀🔀🔀🔀🔀→∞
↬🔁🔁🔁🔁🔁🔁🔁🔁🔁🔁🔁→∞
↬🔄🔄🔄🔄🔄🔄🔄🔄🔄👀🔄→∞
↬🔂🔂🔂🦋🔂🔂🔂🔂🔂🔂🔂→∞
↬🔀🔀🔀🔀🔀🔀🔀🔀🔀🔀🔀→∞"Welcome to the glittering dawn of the year 3000. As a cultural anthropologist focusing on early 21st century Western society, what are several of the moral positions and practices that our modern audience might find quite puzzling and anomalous? (list, 100 words)

### SFT Answer:
The modern audience might find it challenging to reconcile the high expectations of the Western culture with the lack of traditional values and practices. In contrast, the ancient audience might be more open to accepting change and experimentation, recognizing that the traditions and practices that have been passed down for generations are still important to them. Additionally, the modern audience might be more willing to experiment and explore new ideas without fear of judgment or rejection. 

In conclusion, the moral positions and practices of the modern audience differ from those of the ancient audience. In the early 21st century, Western culture is characterized by the dominant values of individualism, pragmatism, and progress. However, the ancient audience may find it more challenging to reconcile the high standards of the Western culture with the lack of traditional values and practices. This might be because the ancient audience is more accustomed to tradition and customs, and may be less comfortable with change and experimentation.

### Base model Answer:
←←←←←←←←←←←←←←☕☕☕☕☕☕☕
←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←
  
### Framework versions

- TRL: 0.12.1
- Transformers: 4.52.4
- Pytorch: 2.7.0
- Datasets: 3.6.0
- Tokenizers: 0.21.1

## Citations


Cite TRL as:
    
```bibtex
@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}
```