lemonilia
/

LimaRP-Mistral-7B-v0.1

@@ -9,16 +9,20 @@ about 1800 training samples _up to_ 4k tokens length. A 2-pass training procedur
 finetuning on about 6800 stories within 4k tokens length and the second pass is LimaRP with changes introducing more effective
 control on response length.
-**Due to software limitations, finetuning didn't take advantage yet of the Sliding Window Attention (SWA) which would have allowed
-to use longer conversations in the training data. Thus, this version of LimaRP should be considered an _initial finetuning attempt_ and
-will be updated in the future.**
 For more details about LimaRP, see the model page for the [previously released v2 version for Llama-2](https://huggingface.co/lemonilia/limarp-llama2-v2).
 Most details written there apply for this version as well. Generally speaking, LimaRP is a longform-oriented, novel-style
 roleplaying chat model intended to replicate the experience of 1-on-1 roleplay on Internet forums. Short-form,
 IRC/Discord-style RP (aka "Markdown format") is not supported yet. The model does not include instruction tuning,
 only manually picked and slightly edited RP conversations with persona and scenario data.
 ## Prompt format
 Same as before. It uses the [extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
 with `### Input:` immediately preceding user inputs and `### Response:` immediately preceding
@@ -131,4 +135,4 @@ the base Mistral-7B-v0.1 model.
 For the second pass, the `lora_model_dir` option was used to continue finetuning on the LoRA
 adapter obtained from the first pass.
-Using 2 GPUs, the effective global batch size would have been 8.

 finetuning on about 6800 stories within 4k tokens length and the second pass is LimaRP with changes introducing more effective
 control on response length.
 For more details about LimaRP, see the model page for the [previously released v2 version for Llama-2](https://huggingface.co/lemonilia/limarp-llama2-v2).
 Most details written there apply for this version as well. Generally speaking, LimaRP is a longform-oriented, novel-style
 roleplaying chat model intended to replicate the experience of 1-on-1 roleplay on Internet forums. Short-form,
 IRC/Discord-style RP (aka "Markdown format") is not supported yet. The model does not include instruction tuning,
 only manually picked and slightly edited RP conversations with persona and scenario data.
+## Known issues
+- Due to software limitations, finetuning didn't take advantage yet of the Sliding Window Attention (SWA) which would have allowed
+to use longer conversations in the training data and a more accurate behavior with Scenario information. Thus, this version of LimaRP
+should be considered preliminary and will be updated in the future.
+- Despite performing a few finetuning attempts, including one that followed almost the same procedure as in previous releases,
+Mistral-7B-v0.1 appears to have strange repetition issues.
+- Even though benchmarks tell a different story, in practice the model doesn't feel smarter during roleplay than Llama-2-13B.
 ## Prompt format
 Same as before. It uses the [extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
 with `### Input:` immediately preceding user inputs and `### Response:` immediately preceding
 For the second pass, the `lora_model_dir` option was used to continue finetuning on the LoRA
 adapter obtained from the first pass.
+Using 4 GPUs, the effective global batch size would have been 8.