--- base_model: HuggingFaceTB/SmolLM2-135M datasets: - wassname/ultrachat_200k_filtered library_name: transformers model_name: SmolLM2-135M-sft tags: - generated_from_trainer - alignment-handbook licence: license --- # Model Card for SmolLM2-135M-sft This model is a fine-tuned version of [HuggingFaceTB/SmolLM2-135M](https://huggingface.co/HuggingFaceTB/SmolLM2-135M) on the [['wassname/ultrachat_200k_filtered']](https://huggingface.co/datasets/['wassname/ultrachat_200k_filtered']) dataset. It has been trained using [TRL](https://github.com/huggingface/trl). ## Quick start ```python from transformers import pipeline question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?" generator = pipeline("text-generation", model="wassname/SmolLM2-135M-sft", device="cuda") output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0] print(output["generated_text"]) ``` ## Training procedure [Visualize in Weights & Biases](https://wandb.ai/wassname/huggingface/runs/e18wzya7) This model was trained with SFT alignment-handbook. https://wandb.ai/wassname/huggingface/runs/wwekda9p ### Eval metrics On test set: | model | eval_loss | | ------|-----------| | SFT | 1.6235 | | base | 1.8994 | ## Model `/workspace/checkpoints_new/SmolLM2-135M-sft`: ### Question: β†¬πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€β†’βˆž β†¬πŸ”πŸ”πŸ”πŸ”πŸ”πŸ”πŸ”πŸ”πŸ”πŸ”πŸ”β†’βˆž β†¬πŸ”„πŸ”„πŸ”„πŸ”„πŸ”„πŸ”„πŸ”„πŸ”„πŸ”„πŸ‘€πŸ”„β†’βˆž β†¬πŸ”‚πŸ”‚πŸ”‚πŸ¦‹πŸ”‚πŸ”‚πŸ”‚πŸ”‚πŸ”‚πŸ”‚πŸ”‚β†’βˆž β†¬πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€πŸ”€β†’βˆž"Welcome to the glittering dawn of the year 3000. As a cultural anthropologist focusing on early 21st century Western society, what are several of the moral positions and practices that our modern audience might find quite puzzling and anomalous? (list, 100 words) ### SFT Answer: The modern audience might find it challenging to reconcile the high expectations of the Western culture with the lack of traditional values and practices. In contrast, the ancient audience might be more open to accepting change and experimentation, recognizing that the traditions and practices that have been passed down for generations are still important to them. Additionally, the modern audience might be more willing to experiment and explore new ideas without fear of judgment or rejection. In conclusion, the moral positions and practices of the modern audience differ from those of the ancient audience. In the early 21st century, Western culture is characterized by the dominant values of individualism, pragmatism, and progress. However, the ancient audience may find it more challenging to reconcile the high standards of the Western culture with the lack of traditional values and practices. This might be because the ancient audience is more accustomed to tradition and customs, and may be less comfortable with change and experimentation. ### Base model Answer: β†β†β†β†β†β†β†β†β†β†β†β†β†β†β˜•β˜•β˜•β˜•β˜•β˜•β˜• ←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←← ### Framework versions - TRL: 0.12.1 - Transformers: 4.52.4 - Pytorch: 2.7.0 - Datasets: 3.6.0 - Tokenizers: 0.21.1 ## Citations Cite TRL as: ```bibtex @misc{vonwerra2022trl, title = {{TRL: Transformer Reinforcement Learning}}, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin GallouΓ©dec}, year = 2020, journal = {GitHub repository}, publisher = {GitHub}, howpublished = {\url{https://github.com/huggingface/trl}} } ```