This collection presents the fine-tuning of the SmolLM model using two (RLHF) approaches: DPO and PPO.