--- language: - en base_model: - HuggingFaceTB/SmolLM-135M-Instruct library_name: transformers --- # πŸ€– SmolLM-135M - DPO Fine-Tuned Model ## πŸ“ Overview This is a **Direct Preference Optimization (DPO) fine-tuned version of SmolLM-135M**, designed to make its responses **more emotional and engaging**. The model has been optimized using **HumanLLMs/Human-Like-DPO-Dataset**. DPO is an alternative to **PPO-based RLHF**, providing stable preference learning while avoiding policy collapse. [Paper](https://arxiv.org/abs/2305.18290) ## πŸ“Š Training & Dataset - **Base Model:** SmolLM-135M - **Fine-Tuned Dataset:** [HumanLLMs/Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset) - **Objective:** Improve **expressiveness and emotional depth**. ## πŸ“Œ Example of DPO-Enhanced Response Here’s an example of how the **DPO model** responds to a simple prompt compared to a standard LLM. ### **Prompt:** ```plaintext What's your morning routine like? ``` ### **Base model response:** ```plaintext As I'm self-driving, I spend most of my morning, exploring new places, and getting into a nap. Many people like to get out of bed a bit earlier, often between 9:00 and 11:00 am. Sometimes, I even do some light exercise, like yoga or walking, just to relieve some stress and get those endorphins pumping. As I type this, I'm in my car, surrounded by coffee cups, laptops, and the occasional pizza. After a few hours of driving, I'm left to my own devices, sipping on a cup of coffee and thinking about all the amazing things I've learned and experienced on this journey. ``` ### **DPO response:** ```plaintext As I sit here, I'm sure I'm already on some kind of morning routine, but I don't know how to write it. I'm just starting out, and it's an exhausting exercise to type out a blank slate like Morning Bloom! My morning routine might be more like this: * 6:00 AM - Wake up, get dressed, and head to the bathroom. * 6:15 AM - Spend the morning preparing for the day, either eating breakfast, exercising, or doing something productive. * 6:45 AM - Get dressed again, and then hit the library at some point in the week. * 7:00 AM - Grab a coffee or coffee, and get ready for the day. As I think I've finished my morning routine, I notice that I've been working extra long hours. I've spent an hour a night staring at a computer screen, and I'm worried that I wasn't doing enough productive work. I want to start with morning work, so I check my priorities and make a conscious effort to meet the daily milestones. I also want to try to get more exercise. I love the way my morning routine is actually a way to start... ```