Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
zijianh
/
DeepSeek-R1-Distill-Qwen-7B-RL-length-penalty-low-high-0_1-new
like
0
Text Generation
Transformers
Safetensors
qwen2
Generated from Trainer
trl
grpo
conversational
text-generation-inference
arxiv:
2402.03300
Model card
Files
Files and versions
Community
Train
Deploy
Use this model
d573650
DeepSeek-R1-Distill-Qwen-7B-RL-length-penalty-low-high-0_1-new
/
model-00001-of-00004.safetensors
Commit History
Training in progress, step 58
c16633c
verified
zijianh
commited on
about 1 month ago
Training in progress, step 50
aac3619
verified
zijianh
commited on
Mar 23
Training in progress, step 40
8d5e1d0
verified
zijianh
commited on
Mar 23
Training in progress, step 30
3cb52ee
verified
zijianh
commited on
Mar 23
Training in progress, step 20
a0be37e
verified
zijianh
commited on
Mar 23
Training in progress, step 10
d39abdb
verified
zijianh
commited on
Mar 22