Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Haitao999
/
Llama-3.2-3B-Instruct-GRPO-numia_prompt_dpo1
like
0
Text Generation
Transformers
Safetensors
RLHFlow/numia_prompt_dpo1
llama
Generated from Trainer
open-r1
trl
grpo
conversational
text-generation-inference
arxiv:
2402.03300
Model card
Files
Files and versions
Community
Train
Deploy
Use this model
main
Llama-3.2-3B-Instruct-GRPO-numia_prompt_dpo1
/
training_args.bin
Commit History
Training in progress, step 60
3a18c8b
verified
Haitao999
commited on
21 days ago
Training in progress, step 10
1bab43d
verified
Haitao999
commited on
21 days ago
Training in progress, step 10
39b8a25
verified
Haitao999
commited on
22 days ago