Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Haitao999
/
Llama-3.1-8B-Instruct-GRPO-numia_prompt_dpo1
like
0
Text Generation
Transformers
Safetensors
RLHFlow/numia_prompt_dpo1
llama
Generated from Trainer
open-r1
trl
grpo
conversational
text-generation-inference
arxiv:
2402.03300
Model card
Files
Files and versions
Community
Train
Deploy
Use this model
main
Llama-3.1-8B-Instruct-GRPO-numia_prompt_dpo1
Commit History
End of training
a2906e3
verified
Haitao999
commited on
14 days ago
Model save
51e1c8e
verified
Haitao999
commited on
14 days ago
Training in progress, step 170
9ce4d0b
verified
Haitao999
commited on
14 days ago
Training in progress, step 160
8a01efe
verified
Haitao999
commited on
14 days ago
Training in progress, step 130
1447ce9
verified
Haitao999
commited on
14 days ago
Training in progress, step 110
b5e1848
verified
Haitao999
commited on
14 days ago
Training in progress, step 100
f969725
verified
Haitao999
commited on
14 days ago
Training in progress, step 80
1e5f9bd
verified
Haitao999
commited on
14 days ago
Training in progress, step 30
0b8dda3
verified
Haitao999
commited on
14 days ago
Training in progress, step 20
cbc3307
verified
Haitao999
commited on
14 days ago
Training in progress, step 10
7af001d
verified
Haitao999
commited on
14 days ago
Training in progress, step 60
b60360a
verified
Haitao999
commited on
15 days ago
Training in progress, step 30
a76f174
verified
Haitao999
commited on
15 days ago
Training in progress, step 20
060b179
verified
Haitao999
commited on
15 days ago
Training in progress, step 10
1f84bf1
verified
Haitao999
commited on
15 days ago
initial commit
6ee0a8b
verified
Haitao999
commited on
15 days ago