Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
estnafinema0 's Collections
PEFT variations
NER Extraction. Active Learning Approach.
SmolLM Variation: PPO & DPO Fine-Tuning for RLHF

SmolLM Variation: PPO & DPO Fine-Tuning for RLHF

updated Mar 30

This collection presents the fine-tuning of the SmolLM model using two (RLHF) approaches: DPO and PPO.

Upvote
1

  • estnafinema0/trainer_output

    Text Classification • Updated Mar 30 • 2

  • estnafinema0/smolLM-variation-dpo

    Text Generation • Updated Mar 30 • 2

  • estnafinema0/smolLM-variation-ppo

    Text Generation • Updated Mar 30 • 3
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs