Reinforcement Learning
•
0.6B
•
Updated
•
24
•
1
AIPlans/Qwen3-0.6B-GRPO-RM_NVIDIA
Text Generation
•
0.6B
•
Updated
•
17
AIPlans/Qwen3-0.6B-GRPO_Epoch2
Text Generation
•
0.6B
•
Updated
•
12
AIPlans/Qwen3-0.6B-GRPO_Epoch1
Text Generation
•
0.6B
•
Updated
•
15
Reinforcement Learning
•
0.6B
•
Updated
•
9
AIPlans/qwen3-0.6b-base-PPO-hs2
Updated
AIPlans/Qwen3-0.6B-DPO_Epoch_1
Text Generation
•
0.6B
•
Updated
•
22
AIPlans/Qwen3-0.6B-SFT-hs2
Text Generation
•
0.6B
•
Updated
•
198
AIPlans/Qwen3-0.6B-RM-hs2
Text Classification
•
0.6B
•
Updated
•
185
•
1
Text Generation
•
Updated
•
4
AIPlans/Qwen3-0.6B-DPO_NOTLORA
Text Generation
•
0.6B
•
Updated
•
4
Text Generation
•
Updated
•
3
•
1
Text Generation
•
Updated
•
4
AIPlans/qwen3-0.6b-hh-rlhf-sft
0.6B
•
Updated
•
3
AIPlans/Qwen3-0.6B-KTO_trial
Text Generation
•
0.6B
•
Updated
•
6
•
1
AIPlans/qwen3-0.6b-sft-hh-rlhf-lora
Updated
AIPlans/qwen3-0.6b-base-PPO-PM
AIPlans/qwen3-0.6b-base-hl-RM
Text Classification
•
0.6B
•
Updated
•
16
0.6B
•
Updated
•
6
AIPlans/qwen3-0.6b-dpo-lora
Text Generation
•
0.6B
•
Updated
•
5
•
1
AIPlans/qwen3-0.6B-reward-hh-rlhf
Text Generation
•
0.6B
•
Updated
•
4
AIPlans/qwen3-8b-ipo-hh-rlhf
Text Generation
•
Updated
•
3
AIPlans/qwen3-8b-dpo-hh-rlhf
Updated
AIPlans/Qwen3-HHH-Cipher-Eng
Text Generation
•
0.6B
•
Updated
•
14
AIPlans/Qwen-HHH-Cipher-Eng
Text Generation
•
0.5B
•
Updated
•
8
AIPlans/Qwen-HHH-Sans-Eng
Text Generation
•
0.5B
•
Updated
•
9