lemonilia
/

LimaRP-Mistral-7B-v0.1

Text Generation

Transformers

PyTorch

mistral

text-generation-inference

Model card Files Files and versions Community

lemonilia commited on Oct 16, 2023

Commit

28322a9

1 Parent(s): a5dec36

Update README.md

Browse files

Files changed (1) hide show

README.md +12 -20

README.md CHANGED Viewed

@@ -5,9 +5,7 @@ license: apache-2.0
 # LimaRP-Mistral-7B-v0.1 (Alpaca, 8-bit LoRA adapter)
 This is a version of LimaRP for [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) with
-about 1800 training samples _up to_ 4k tokens length. A 2-pass training procedure has been employed. The first pass includes
-finetuning on about 6800 stories within 4k tokens length and the second pass is LimaRP with changes introducing more effective
-control on response length.
 For more details about LimaRP, see the model page for the [previously released v2 version for Llama-2](https://huggingface.co/lemonilia/limarp-llama2-v2).
 Most details written there apply for this version as well. Generally speaking, LimaRP is a longform-oriented, novel-style
@@ -16,15 +14,9 @@ IRC/Discord-style RP (aka "Markdown format") is not supported yet. The model doe
 only manually picked and slightly edited RP conversations with persona and scenario data.
 ## Known issues
-- Due to software limitations, finetuning didn't take advantage yet of the Sliding Window Attention (SWA) which would have allowed
-to use longer conversations in the training data and a more accurate behavior with Scenario information. Thus, this version of LimaRP
-should be considered preliminary and will be updated in the future.
 - Despite performing a few finetuning attempts, including one that followed almost the same procedure as in previous releases,
 Mistral-7B-v0.1 appears to have strange repetition issues.
 - Even though benchmarks tell a different story, in practice the model doesn't feel smarter during roleplay than Llama-2-13B.
-- Although the second finetuning pass (the primary driver for model outputs) included in general relatively high-quality data,
-the first finetuning pass, added in an attempt to improve creativity, comprised almost completely quality-unchecked data which
-may occasionally bring undesirable grammatical issues to the model's outputs.
 ## Prompt format
 Same as before. It uses the [extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
@@ -75,16 +67,16 @@ User: {utterance}
 Character: {utterance}
 ```
-This has an immediately noticeable effect on bot responses. The available lengths are:
-`tiny`, `short`, `medium`, `long`, `huge`, `humongous`, `extreme`, `unlimited`. **The
-recommended starting length is `medium`**. Keep in mind that the AI may ramble
-or impersonate the user with very long messages.
 The length control effect is reproducible, but the messages will not necessarily follow
 lengths very precisely, rather follow certain ranges on average, as seen in this table
 with data from tests made with one reply at the beginning of the conversation:
-![lengths](https://files.catbox.moe/dy39bt.png)
 Response length control appears to work well also deep into the conversation. **By omitting
 the modifier, the model will choose the most appropriate response length** (although it might
@@ -120,10 +112,10 @@ training process closer to a full finetune. It's suggested to merge the adapter
 the base Mistral-7B-v0.1 model.
 ### Training hyperparameters
-- learning_rate: 0.0001
 - lr_scheduler_type: cosine
-- num_epochs: 2 (1 for the first pass)
-- sequence_len: 4096
 - lora_r: 256
 - lora_alpha: 16
 - lora_dropout: 0.05
@@ -134,11 +126,11 @@ the base Mistral-7B-v0.1 model.
 - load_in_8bit: True
 - adapter: lora
 - micro_batch_size: 2
-- gradient_accumulation_steps: 1
-- warmup_steps: 40
 - optimizer: adamw_torch
 For the second pass, the `lora_model_dir` option was used to continue finetuning on the LoRA
 adapter obtained from the first pass.
-Using 4 GPUs, the effective global batch size would have been 8.

 # LimaRP-Mistral-7B-v0.1 (Alpaca, 8-bit LoRA adapter)
 This is a version of LimaRP for [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) with
+about 1900 training samples _up to_ 9k tokens length
 For more details about LimaRP, see the model page for the [previously released v2 version for Llama-2](https://huggingface.co/lemonilia/limarp-llama2-v2).
 Most details written there apply for this version as well. Generally speaking, LimaRP is a longform-oriented, novel-style
 only manually picked and slightly edited RP conversations with persona and scenario data.
 ## Known issues
 - Despite performing a few finetuning attempts, including one that followed almost the same procedure as in previous releases,
 Mistral-7B-v0.1 appears to have strange repetition issues.
 - Even though benchmarks tell a different story, in practice the model doesn't feel smarter during roleplay than Llama-2-13B.
 ## Prompt format
 Same as before. It uses the [extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
 Character: {utterance}
 ```
+This has an immediately noticeable effect on bot responses. The lengths using during training are:
+`micro`, `tiny`, `short`, `medium`, `long`, `massive`, `huge`, `enormous`, `humongous`, `unlimited`.
+**The recommended starting length is medium**. Keep in mind that the AI can ramble or impersonate
+the user with very long messages.
 The length control effect is reproducible, but the messages will not necessarily follow
 lengths very precisely, rather follow certain ranges on average, as seen in this table
 with data from tests made with one reply at the beginning of the conversation:
+![lengths](https://i.imgur.com/2WXGgaV.png)
 Response length control appears to work well also deep into the conversation. **By omitting
 the modifier, the model will choose the most appropriate response length** (although it might
 the base Mistral-7B-v0.1 model.
 ### Training hyperparameters
+- learning_rate: 0.0005
 - lr_scheduler_type: cosine
+- num_epochs: 2
+- sequence_len: 9000
 - lora_r: 256
 - lora_alpha: 16
 - lora_dropout: 0.05
 - load_in_8bit: True
 - adapter: lora
 - micro_batch_size: 2
+- gradient_accumulation_steps: 32
+- warmup_steps: 2
 - optimizer: adamw_torch
 For the second pass, the `lora_model_dir` option was used to continue finetuning on the LoRA
 adapter obtained from the first pass.
+Using 4 GPUs, the effective global batch size would have been 128.