Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
mnoukhov 's Collections
Elastic Reset
Asynchronous RLHF

Asynchronous RLHF

updated Oct 28, 2024

Models and datasets for asynchronous rlhf paper, see code at https://github.com/mnoukhov/async_rlhf

Upvote
-

  • mnoukhov/pythia410m-sft-tldr

    Text Generation • Updated May 16, 2024 • 51

  • mnoukhov/pythia1b-sft-tldr

    Text Generation • Updated Jul 3, 2024 • 117

  • mnoukhov/pythia2.8b-sft-tldr

    Text Generation • Updated Jul 7, 2024 • 11

  • mnoukhov/pythia410m-rm-tldr6.9b

    Text Classification • Updated Jun 20, 2024 • 8

  • mnoukhov/pythia1b-rm-tldr6.9b

    Text Classification • Updated Jul 3, 2024 • 8

  • mnoukhov/pythia2.8b-rm-tldr6.9b

    Text Classification • Updated Jul 7, 2024 • 11

  • cleanrl/EleutherAI_pythia-6.9b-deduped__reward__tldr

    Text Classification • Updated May 7, 2024 • 23

  • mnoukhov/summarize_from_feedback_oai_preprocessing_1706381144_relabel_pythia6.9b

    Viewer • Updated Jun 20, 2024 • 177k • 31

  • vwxyzjn/summarize_from_feedback_tldr_3_filtered_oai_preprocessing_1706381144

    Viewer • Updated Jan 27, 2024 • 130k • 581

  • Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

    Paper • 2410.18252 • Published Oct 23, 2024 • 7
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs