Update README.md
Browse files
README.md
CHANGED
@@ -9,16 +9,20 @@ about 1800 training samples _up to_ 4k tokens length. A 2-pass training procedur
|
|
9 |
finetuning on about 6800 stories within 4k tokens length and the second pass is LimaRP with changes introducing more effective
|
10 |
control on response length.
|
11 |
|
12 |
-
**Due to software limitations, finetuning didn't take advantage yet of the Sliding Window Attention (SWA) which would have allowed
|
13 |
-
to use longer conversations in the training data. Thus, this version of LimaRP should be considered an _initial finetuning attempt_ and
|
14 |
-
will be updated in the future.**
|
15 |
-
|
16 |
For more details about LimaRP, see the model page for the [previously released v2 version for Llama-2](https://huggingface.co/lemonilia/limarp-llama2-v2).
|
17 |
Most details written there apply for this version as well. Generally speaking, LimaRP is a longform-oriented, novel-style
|
18 |
roleplaying chat model intended to replicate the experience of 1-on-1 roleplay on Internet forums. Short-form,
|
19 |
IRC/Discord-style RP (aka "Markdown format") is not supported yet. The model does not include instruction tuning,
|
20 |
only manually picked and slightly edited RP conversations with persona and scenario data.
|
21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
## Prompt format
|
23 |
Same as before. It uses the [extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
|
24 |
with `### Input:` immediately preceding user inputs and `### Response:` immediately preceding
|
@@ -131,4 +135,4 @@ the base Mistral-7B-v0.1 model.
|
|
131 |
For the second pass, the `lora_model_dir` option was used to continue finetuning on the LoRA
|
132 |
adapter obtained from the first pass.
|
133 |
|
134 |
-
Using
|
|
|
9 |
finetuning on about 6800 stories within 4k tokens length and the second pass is LimaRP with changes introducing more effective
|
10 |
control on response length.
|
11 |
|
|
|
|
|
|
|
|
|
12 |
For more details about LimaRP, see the model page for the [previously released v2 version for Llama-2](https://huggingface.co/lemonilia/limarp-llama2-v2).
|
13 |
Most details written there apply for this version as well. Generally speaking, LimaRP is a longform-oriented, novel-style
|
14 |
roleplaying chat model intended to replicate the experience of 1-on-1 roleplay on Internet forums. Short-form,
|
15 |
IRC/Discord-style RP (aka "Markdown format") is not supported yet. The model does not include instruction tuning,
|
16 |
only manually picked and slightly edited RP conversations with persona and scenario data.
|
17 |
|
18 |
+
## Known issues
|
19 |
+
- Due to software limitations, finetuning didn't take advantage yet of the Sliding Window Attention (SWA) which would have allowed
|
20 |
+
to use longer conversations in the training data and a more accurate behavior with Scenario information. Thus, this version of LimaRP
|
21 |
+
should be considered preliminary and will be updated in the future.
|
22 |
+
- Despite performing a few finetuning attempts, including one that followed almost the same procedure as in previous releases,
|
23 |
+
Mistral-7B-v0.1 appears to have strange repetition issues.
|
24 |
+
- Even though benchmarks tell a different story, in practice the model doesn't feel smarter during roleplay than Llama-2-13B.
|
25 |
+
|
26 |
## Prompt format
|
27 |
Same as before. It uses the [extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
|
28 |
with `### Input:` immediately preceding user inputs and `### Response:` immediately preceding
|
|
|
135 |
For the second pass, the `lora_model_dir` option was used to continue finetuning on the LoRA
|
136 |
adapter obtained from the first pass.
|
137 |
|
138 |
+
Using 4 GPUs, the effective global batch size would have been 8.
|