HuangXinBa
/

GRPO

Text Generation

reinforcement-learning

instruction-tuning

chain-of-thought

Model card Files Files and versions

GRPO / merges.txt

HuangXinBa's picture

Upload tokenizer

bf15d76 verified 3 months ago

466 kB

File too large to display, you can check the raw version instead.