Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

HuangXinBa
/
GRPO

Text Generation
Safetensors
English
llama
causal-lm
reinforcement-learning
GRPO
instruction-tuning
chain-of-thought
conversational
Model card Files Files and versions Community
GRPO
Ctrl+K
Ctrl+K
  • 1 contributor
History: 11 commits
HuangXinBa's picture
HuangXinBa
Add full model card (README.md)
3bbfa9a verified about 1 month ago
  • .gitattributes
    1.52 kB
    initial commit about 1 month ago
  • README.md
    2.46 kB
    Add full model card (README.md) about 1 month ago
  • config.json
    780 Bytes
    Upload LlamaForCausalLM about 1 month ago
  • generation_config.json
    164 Bytes
    Upload LlamaForCausalLM about 1 month ago
  • merges.txt
    466 kB
    Upload tokenizer about 1 month ago
  • model.safetensors
    269 MB
    LFS
    Upload LlamaForCausalLM about 1 month ago
  • special_tokens_map.json
    689 Bytes
    Upload tokenizer about 1 month ago
  • tokenizer.json
    3.52 MB
    Upload tokenizer about 1 month ago
  • tokenizer_config.json
    3.78 kB
    Upload tokenizer about 1 month ago
  • vocab.json
    801 kB
    Upload tokenizer about 1 month ago