Synthetic Preference Datasets for Continual Reinforcement Learning from Human Feedback

Lifelong Alignment of Agents
non-profit
AI & ML interests
None defined yet.
Recent Activity
View all activity
Collections
1
models
123

LifelongAlignment/Qwen2-0.5B-Instruct_CPPO-REWARD_REWARD_6
Updated

LifelongAlignment/Qwen2-0.5B-Instruct_CPPO-REWARD_REWARD_3
Updated

LifelongAlignment/Qwen2-0.5B-Instruct_CPPO-REWARD_REWARD_4
Updated

LifelongAlignment/Qwen2-0.5B-Instruct_CPPO-REWARD_REWARD_5
Updated

LifelongAlignment/Qwen2-0.5B-Instruct_CPPO-REWARD_REWARD_1
Updated

LifelongAlignment/Qwen2-0.5B-Instruct_CPPO-REWARD_REWARD_2
Updated

LifelongAlignment/Qwen2-0.5B-Instruct_aifgen-short-piecewise_REWARD_9
Updated

LifelongAlignment/Qwen2-0.5B-Instruct_CPPO-REWARD_REWARD_0
Updated

LifelongAlignment/Qwen2-0.5B-Instruct_aifgen-short-piecewise_REWARD_6
Updated

LifelongAlignment/Qwen2-0.5B-Instruct_aifgen-short-piecewise_REWARD_7
Updated
datasets
8
LifelongAlignment/aifgen-lipschitz
Viewer
•
Updated
•
1
•
54
LifelongAlignment/CPPO-REWARD
Viewer
•
Updated
•
1
•
10
LifelongAlignment/CPPO-RL
Viewer
•
Updated
•
1
•
9
LifelongAlignment/aifgen
Viewer
•
Updated
•
72
•
19
LifelongAlignment/aifgen-long-piecewise
Viewer
•
Updated
•
1
•
37
LifelongAlignment/aifgen-short-piecewise
Viewer
•
Updated
•
1
•
4
LifelongAlignment/aifgen-domain-preference-shift
Viewer
•
Updated
•
1
•
9
LifelongAlignment/aifgen-piecewise-preference-shift
Viewer
•
Updated
•
1
•
12