lemonilia
/

LimaRP-Mistral-7B-v0.1

Text Generation

Transformers

PyTorch

mistral

text-generation-inference

Model card Files Files and versions Community

lemonilia commited on Oct 19, 2023

Commit

caa42e7

1 Parent(s): 7a839c7

Update README.md

Browse files

Files changed (1) hide show

README.md +23 -31

README.md CHANGED Viewed

@@ -2,10 +2,10 @@
 license: apache-2.0
 ---
-# LimaRP-Mistral-7B-v0.1 (Alpaca, 8-bit LoRA adapter)
 This is a version of LimaRP for [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) with
-about 1900 training samples _up to_ 9k tokens length
 For more details about LimaRP, see the model page for the [previously released v2 version for Llama-2](https://huggingface.co/lemonilia/limarp-llama2-v2).
 Most details written there apply for this version as well. Generally speaking, LimaRP is a longform-oriented, novel-style
@@ -13,11 +13,6 @@ roleplaying chat model intended to replicate the experience of 1-on-1 roleplay o
 IRC/Discord-style RP (aka "Markdown format") is not supported yet. The model does not include instruction tuning,
 only manually picked and slightly edited RP conversations with persona and scenario data.
-## Known issues
-- Despite performing a few finetuning attempts, including one that followed almost the same procedure as in previous releases,
-Mistral-7B-v0.1 appears to have strange repetition issues.
-- Even though benchmarks tell a different story, in practice the model doesn't feel smarter during roleplay than Llama-2-13B.
 ## Prompt format
 Same as before. It uses the [extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
 with `### Input:` immediately preceding user inputs and `### Response:` immediately preceding
@@ -83,39 +78,33 @@ the modifier, the model will choose the most appropriate response length** (alth
 not necessarily be what the user desires).
 ## Suggested settings
-You can follow these instruction format settings in SillyTavern. Replace `tiny` with
 your desired response length:
-![settings](https://files.catbox.moe/6lcz0u.png)
 ## Text generation settings
-Mistral-7B-v0.1 appears to have repetition issues. A low temperature combined with a relatively high
-repetition penalty and low penalty range (about as long as the prior 2 messages) appears to help:
-- TFS = 0.90~0.95
-- Temperature = 0.50~0.55
-- Repetition penalty = ~1.15
-- Repetition penalty range = ~512
 - top-k = 0 (disabled)
 - top-p = 1 (disabled)
 ## Training procedure
 [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
-on 2x NVidia A40 GPUs.
 The A40 GPUs have been graciously provided by [Arc Compute](https://www.arccompute.io/).
-The model has been trained as an 8-bit LoRA adapter, and
-it's so large because a LoRA rank of 256 was also used. The reasoning was that this
-might have helped the model internalize any newly acquired information, making the
-training process closer to a full finetune. It's suggested to merge the adapter to
-the base Mistral-7B-v0.1 model.
 ### Training hyperparameters
-- learning_rate: 0.0005
-- lr_scheduler_type: cosine
 - num_epochs: 2
-- sequence_len: 9000
 - lora_r: 256
 - lora_alpha: 16
 - lora_dropout: 0.05
@@ -125,12 +114,15 @@ the base Mistral-7B-v0.1 model.
 - tf32: True
 - load_in_8bit: True
 - adapter: lora
-- micro_batch_size: 2
-- gradient_accumulation_steps: 32
-- warmup_steps: 2
 - optimizer: adamw_torch
-For the second pass, the `lora_model_dir` option was used to continue finetuning on the LoRA
-adapter obtained from the first pass.
-Using 2 GPUs, the effective global batch size would have been 128.

 license: apache-2.0
 ---
+# LimaRP-Mistral-7B-v0.1 (Alpaca)
 This is a version of LimaRP for [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) with
+about 2000 training samples _up to_ 9k tokens length
 For more details about LimaRP, see the model page for the [previously released v2 version for Llama-2](https://huggingface.co/lemonilia/limarp-llama2-v2).
 Most details written there apply for this version as well. Generally speaking, LimaRP is a longform-oriented, novel-style
 IRC/Discord-style RP (aka "Markdown format") is not supported yet. The model does not include instruction tuning,
 only manually picked and slightly edited RP conversations with persona and scenario data.
 ## Prompt format
 Same as before. It uses the [extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
 with `### Input:` immediately preceding user inputs and `### Response:` immediately preceding
 not necessarily be what the user desires).
 ## Suggested settings
+You can follow these instruction format settings in SillyTavern. Replace `medium` with
 your desired response length:
+![settings](https://files.catbox.moe/24c1w0.png)
 ## Text generation settings
+These settings could be a good general starting point:
+- TFS = 0.92
+- Temperature = 0.70
+- Repetition penalty = ~1.1
+- Repetition penalty range = ~2048
 - top-k = 0 (disabled)
 - top-p = 1 (disabled)
 ## Training procedure
 [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
+on 4x NVidia A40 GPUs.
 The A40 GPUs have been graciously provided by [Arc Compute](https://www.arccompute.io/).
 ### Training hyperparameters
+- learning_rate: 0.0003
+- lr_scheduler: constant_with_warmup
+- noisy_embedding_alpha: 5
 - num_epochs: 2
+- sequence_len: 8750
 - lora_r: 256
 - lora_alpha: 16
 - lora_dropout: 0.05
 - tf32: True
 - load_in_8bit: True
 - adapter: lora
+- micro_batch_size: 1
+- gradient_accumulation_steps: 1
+- warmup_steps: 10
 - optimizer: adamw_torch
+- flash_attention: true
+- sample_packing: true
+- pad_to_sequence_len: true
+Using 4 GPUs, the effective global batch size would have been 4.
+### Training loss graph
+![Train loss](https://files.catbox.moe/0pj84w.png)