lm-human-preference-details vwxyzjn/train_policy_accelerate__sentiment_offline_5k.json__seed1__1696447674 Text Generation • Updated Oct 4, 2023 • 122 lm-human-preference-details/train_policy_accelerate__sentiment_offline_5k.json__seed1 Text Generation • Updated Oct 4, 2023 • 125
vwxyzjn/train_policy_accelerate__sentiment_offline_5k.json__seed1__1696447674 Text Generation • Updated Oct 4, 2023 • 122
lm-human-preference-details/train_policy_accelerate__sentiment_offline_5k.json__seed1 Text Generation • Updated Oct 4, 2023 • 125
TL;DR summarization checkpoints The checkpoints are trained in https://arxiv.org/abs/2403.17031 and taken from https://wandb.ai/costa-huang/tldr_summarize/reports/Release--Vmlldzo3MT cleanrl/EleutherAI_pythia-1b-deduped__sft__tldr Text Generation • Updated May 15, 2024 • 4.74k cleanrl/EleutherAI_pythia-1b-deduped__reward__tldr Text Classification • Updated May 15, 2024 • 4.02k cleanrl/EleutherAI_pythia-2.8b-deduped__sft__tldr Text Generation • Updated May 15, 2024 • 67 cleanrl/EleutherAI_pythia-2.8b-deduped__reward__tldr Text Classification • Updated May 15, 2024 • 51
cleanrl/EleutherAI_pythia-1b-deduped__reward__tldr Text Classification • Updated May 15, 2024 • 4.02k