Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
HuangXinBa
/
GRPO
like
1
Text Generation
Safetensors
gsm8k
English
llama
causal-lm
reinforcement-learning
GRPO
instruction-tuning
chain-of-thought
conversational
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
main
GRPO
Commit History
Add full model card (README.md)
3bbfa9a
verified
HuangXinBa
commited on
May 28
Upload LlamaForCausalLM
bc15fbb
verified
HuangXinBa
commited on
May 28
Add full model card (README.md)
8cb0da9
verified
HuangXinBa
commited on
May 28
Upload LlamaForCausalLM
f8e9135
verified
HuangXinBa
commited on
May 28
Add full model card (README.md)
5d4e08f
verified
HuangXinBa
commited on
May 28
Upload LlamaForCausalLM
5d93fe5
verified
HuangXinBa
commited on
May 28
Add full model card (README.md)
77c452c
verified
HuangXinBa
commited on
May 28
Upload LlamaForCausalLM
6bd03f6
verified
HuangXinBa
commited on
May 27
Upload tokenizer
bf15d76
verified
HuangXinBa
commited on
May 27
Upload LlamaForCausalLM
33bf48c
verified
HuangXinBa
commited on
May 27
initial commit
4d8631f
verified
HuangXinBa
commited on
May 27