jiazhengli/Pythia-2.8B-HH-RLHF-Iterative-SamPO
Text Generation
•
3B
•
Updated
•
9
Resources for EMNLP 2024 Paper: Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence