Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Haitao999
/
Llama-3.2-3B-Instruct-GRPO-numia_prompt_dpo1
like
0
Text Generation
Transformers
Safetensors
RLHFlow/numia_prompt_dpo1
llama
Generated from Trainer
open-r1
trl
grpo
conversational
text-generation-inference
arxiv:
2402.03300
Model card
Files
Files and versions
Community
Train
Deploy
Use this model
main
Llama-3.2-3B-Instruct-GRPO-numia_prompt_dpo1
/
model-00001-of-00002.safetensors
Commit History
Model save
17bc122
verified
Haitao999
commited on
20 days ago
Training in progress, step 170
cfc1728
verified
Haitao999
commited on
20 days ago
Training in progress, step 160
20e45bf
verified
Haitao999
commited on
20 days ago
Training in progress, step 130
88dd23f
verified
Haitao999
commited on
20 days ago
Training in progress, step 120
2e74454
verified
Haitao999
commited on
20 days ago
Training in progress, step 110
1367eca
verified
Haitao999
commited on
20 days ago
Training in progress, step 100
527f4c1
verified
Haitao999
commited on
20 days ago
Training in progress, step 80
238b480
verified
Haitao999
commited on
20 days ago
Training in progress, step 60
3a18c8b
verified
Haitao999
commited on
20 days ago
Training in progress, step 10
1bab43d
verified
Haitao999
commited on
20 days ago
Training in progress, step 110
4c8eee8
verified
Haitao999
commited on
20 days ago
Training in progress, step 100
b43be7e
verified
Haitao999
commited on
20 days ago
Training in progress, step 80
e41ccf4
verified
Haitao999
commited on
20 days ago
Training in progress, step 60
d975322
verified
Haitao999
commited on
21 days ago
Training in progress, step 30
ff2b7d7
verified
Haitao999
commited on
21 days ago
Training in progress, step 20
37eef80
verified
Haitao999
commited on
21 days ago
Training in progress, step 10
39b8a25
verified
Haitao999
commited on
21 days ago