Models and datasets in paper "WPO: Enhancing RLHF with Weighted Preference Optimization".
Wenxuan Zhou
wzhouad
AI & ML interests
None yet
Organizations
models
8
wzhouad/Llama3-Instruct-8B-WPO-HB-v2
Text Generation
•
8B
•
Updated
•
14
•
5
wzhouad/Llama3-Instruct-8B-WPO-HB
Text Generation
•
8B
•
Updated
•
44
•
1
wzhouad/zephyr-7B-WPO-HB
Text Generation
•
7B
•
Updated
•
26
wzhouad/gemma-2-9b-it-WPO-HB
Text Generation
•
9B
•
Updated
•
24
•
35
wzhouad/gemma-2-9b-it-WPO-FP
Text Generation
•
9B
•
Updated
•
43
wzhouad/zephyr-7B-WPO-FP
Text Generation
•
7B
•
Updated
•
31
wzhouad/Llama3-Instruct-8B-WPO-FP
Text Generation
•
8B
•
Updated
•
15
wzhouad/prix-lm
Text Generation
•
Updated
•
26