Upload LlamaForCausalLM
Browse files
README.md
CHANGED
@@ -1,19 +1,20 @@
|
|
1 |
-
|
2 |
---
|
3 |
license: apache-2.0
|
4 |
language: en
|
5 |
tags:
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
|
|
|
|
12 |
datasets:
|
13 |
-
|
14 |
pipeline_tag: text-generation
|
15 |
widget:
|
16 |
-
|
17 |
---
|
18 |
|
19 |
# GRPO: Finetuned Causal Language Model using Generalized Reinforcement Policy Optimization
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
language: en
|
4 |
tags:
|
5 |
+
- text-generation
|
6 |
+
- causal-lm
|
7 |
+
- reinforcement-learning
|
8 |
+
- GRPO
|
9 |
+
- instruction-tuning
|
10 |
+
- chain-of-thought
|
11 |
+
- trl
|
12 |
+
- grpo
|
13 |
datasets:
|
14 |
+
- gsm8k
|
15 |
pipeline_tag: text-generation
|
16 |
widget:
|
17 |
+
- text: What is 27 plus 16? Let's think step by step.
|
18 |
---
|
19 |
|
20 |
# GRPO: Finetuned Causal Language Model using Generalized Reinforcement Policy Optimization
|