Update README.md
Browse files
README.md
CHANGED
@@ -6,18 +6,18 @@ language:
|
|
6 |
library_name: transformers
|
7 |
---
|
8 |
|
9 |
-
# OPT-1.3b
|
10 |
|
11 |
|
12 |
|
13 |
# Model Description
|
14 |
|
15 |
<!-- Provide a longer summary of what this model is. -->
|
16 |
-
zen-E/deepspeed-chat-
|
17 |
|
18 |
The model is finetuned on 4 datasets with a split of 2, 4, 4 for steps of SFT, reward modeling, and RLHF.
|
19 |
|
20 |
-
The training log is attached. 2 A100-40GB is used to finetune the model, gradient_accumulation_steps are tuned to be 4.
|
21 |
|
22 |
### Model Sources
|
23 |
|
|
|
6 |
library_name: transformers
|
7 |
---
|
8 |
|
9 |
+
# OPT-1.3b RLHFed by DeepSpeed-Chat
|
10 |
|
11 |
|
12 |
|
13 |
# Model Description
|
14 |
|
15 |
<!-- Provide a longer summary of what this model is. -->
|
16 |
+
zen-E/deepspeed-chat-step3-rlhf-actor-model-opt1.3b is an OPT-1.3b model RLHFed by DeepSpeedExamples/applications/DeepSpeed-Chat.
|
17 |
|
18 |
The model is finetuned on 4 datasets with a split of 2, 4, 4 for steps of SFT, reward modeling, and RLHF.
|
19 |
|
20 |
+
The training log is attached. 2 A100-40GB is used to finetune the model, gradient_accumulation_steps are tuned to be 8, the batch size is 4.
|
21 |
|
22 |
### Model Sources
|
23 |
|