Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
zijianh
/
DeepSeek-R1-Distill-Qwen-7B-RL-length-penalty-low-high-0_5-new
like
0
Text Generation
Transformers
Safetensors
DigitalLearningGmbH/MATH-lighteval
qwen2
Generated from Trainer
open-r1
trl
grpo
conversational
text-generation-inference
arxiv:
2402.03300
Model card
Files
Files and versions
Community
Train
Deploy
Use this model
d0736a5
DeepSeek-R1-Distill-Qwen-7B-RL-length-penalty-low-high-0_5-new
Commit History
Training in progress, step 58
d0736a5
verified
zijianh
commited on
Mar 22
Training in progress, step 50
263ac95
verified
zijianh
commited on
Mar 22
Training in progress, step 40
b0e9a9a
verified
zijianh
commited on
Mar 22
Training in progress, step 30
d51cf42
verified
zijianh
commited on
Mar 22
Training in progress, step 20
ccc6a28
verified
zijianh
commited on
Mar 22
Training in progress, step 10
a6b5e0a
verified
zijianh
commited on
Mar 22
initial commit
f2b7775
verified
zijianh
commited on
Mar 22