SmolLM Variation: PPO & DPO Fine-Tuning for RLHF

estnafinema0 's Collections

updated Mar 30

This collection presents the fine-tuning of the SmolLM model using two (RLHF) approaches: DPO and PPO.