Update README.md
Browse files
README.md
CHANGED
|
@@ -2,10 +2,10 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
|
| 5 |
-
# LimaRP-Mistral-7B-v0.1 (Alpaca
|
| 6 |
|
| 7 |
This is a version of LimaRP for [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) with
|
| 8 |
-
about
|
| 9 |
|
| 10 |
For more details about LimaRP, see the model page for the [previously released v2 version for Llama-2](https://huggingface.co/lemonilia/limarp-llama2-v2).
|
| 11 |
Most details written there apply for this version as well. Generally speaking, LimaRP is a longform-oriented, novel-style
|
|
@@ -13,11 +13,6 @@ roleplaying chat model intended to replicate the experience of 1-on-1 roleplay o
|
|
| 13 |
IRC/Discord-style RP (aka "Markdown format") is not supported yet. The model does not include instruction tuning,
|
| 14 |
only manually picked and slightly edited RP conversations with persona and scenario data.
|
| 15 |
|
| 16 |
-
## Known issues
|
| 17 |
-
- Despite performing a few finetuning attempts, including one that followed almost the same procedure as in previous releases,
|
| 18 |
-
Mistral-7B-v0.1 appears to have strange repetition issues.
|
| 19 |
-
- Even though benchmarks tell a different story, in practice the model doesn't feel smarter during roleplay than Llama-2-13B.
|
| 20 |
-
|
| 21 |
## Prompt format
|
| 22 |
Same as before. It uses the [extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
|
| 23 |
with `### Input:` immediately preceding user inputs and `### Response:` immediately preceding
|
|
@@ -83,39 +78,33 @@ the modifier, the model will choose the most appropriate response length** (alth
|
|
| 83 |
not necessarily be what the user desires).
|
| 84 |
|
| 85 |
## Suggested settings
|
| 86 |
-
You can follow these instruction format settings in SillyTavern. Replace `
|
| 87 |
your desired response length:
|
| 88 |
|
| 89 |
-
 appears to help:
|
| 94 |
|
| 95 |
-
- TFS = 0.
|
| 96 |
-
- Temperature = 0.
|
| 97 |
-
- Repetition penalty = ~1.
|
| 98 |
-
- Repetition penalty range = ~
|
| 99 |
- top-k = 0 (disabled)
|
| 100 |
- top-p = 1 (disabled)
|
| 101 |
|
| 102 |
## Training procedure
|
| 103 |
[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
|
| 104 |
-
on
|
| 105 |
|
| 106 |
The A40 GPUs have been graciously provided by [Arc Compute](https://www.arccompute.io/).
|
| 107 |
|
| 108 |
-
The model has been trained as an 8-bit LoRA adapter, and
|
| 109 |
-
it's so large because a LoRA rank of 256 was also used. The reasoning was that this
|
| 110 |
-
might have helped the model internalize any newly acquired information, making the
|
| 111 |
-
training process closer to a full finetune. It's suggested to merge the adapter to
|
| 112 |
-
the base Mistral-7B-v0.1 model.
|
| 113 |
-
|
| 114 |
### Training hyperparameters
|
| 115 |
-
- learning_rate: 0.
|
| 116 |
-
-
|
|
|
|
| 117 |
- num_epochs: 2
|
| 118 |
-
- sequence_len:
|
| 119 |
- lora_r: 256
|
| 120 |
- lora_alpha: 16
|
| 121 |
- lora_dropout: 0.05
|
|
@@ -125,12 +114,15 @@ the base Mistral-7B-v0.1 model.
|
|
| 125 |
- tf32: True
|
| 126 |
- load_in_8bit: True
|
| 127 |
- adapter: lora
|
| 128 |
-
- micro_batch_size:
|
| 129 |
-
- gradient_accumulation_steps:
|
| 130 |
-
- warmup_steps:
|
| 131 |
- optimizer: adamw_torch
|
|
|
|
|
|
|
|
|
|
| 132 |
|
| 133 |
-
|
| 134 |
-
adapter obtained from the first pass.
|
| 135 |
|
| 136 |
-
|
|
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
|
| 5 |
+
# LimaRP-Mistral-7B-v0.1 (Alpaca)
|
| 6 |
|
| 7 |
This is a version of LimaRP for [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) with
|
| 8 |
+
about 2000 training samples _up to_ 9k tokens length
|
| 9 |
|
| 10 |
For more details about LimaRP, see the model page for the [previously released v2 version for Llama-2](https://huggingface.co/lemonilia/limarp-llama2-v2).
|
| 11 |
Most details written there apply for this version as well. Generally speaking, LimaRP is a longform-oriented, novel-style
|
|
|
|
| 13 |
IRC/Discord-style RP (aka "Markdown format") is not supported yet. The model does not include instruction tuning,
|
| 14 |
only manually picked and slightly edited RP conversations with persona and scenario data.
|
| 15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
## Prompt format
|
| 17 |
Same as before. It uses the [extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
|
| 18 |
with `### Input:` immediately preceding user inputs and `### Response:` immediately preceding
|
|
|
|
| 78 |
not necessarily be what the user desires).
|
| 79 |
|
| 80 |
## Suggested settings
|
| 81 |
+
You can follow these instruction format settings in SillyTavern. Replace `medium` with
|
| 82 |
your desired response length:
|
| 83 |
|
| 84 |
+

|
| 85 |
|
| 86 |
## Text generation settings
|
| 87 |
+
These settings could be a good general starting point:
|
|
|
|
| 88 |
|
| 89 |
+
- TFS = 0.92
|
| 90 |
+
- Temperature = 0.70
|
| 91 |
+
- Repetition penalty = ~1.1
|
| 92 |
+
- Repetition penalty range = ~2048
|
| 93 |
- top-k = 0 (disabled)
|
| 94 |
- top-p = 1 (disabled)
|
| 95 |
|
| 96 |
## Training procedure
|
| 97 |
[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
|
| 98 |
+
on 4x NVidia A40 GPUs.
|
| 99 |
|
| 100 |
The A40 GPUs have been graciously provided by [Arc Compute](https://www.arccompute.io/).
|
| 101 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
### Training hyperparameters
|
| 103 |
+
- learning_rate: 0.0003
|
| 104 |
+
- lr_scheduler: constant_with_warmup
|
| 105 |
+
- noisy_embedding_alpha: 5
|
| 106 |
- num_epochs: 2
|
| 107 |
+
- sequence_len: 8750
|
| 108 |
- lora_r: 256
|
| 109 |
- lora_alpha: 16
|
| 110 |
- lora_dropout: 0.05
|
|
|
|
| 114 |
- tf32: True
|
| 115 |
- load_in_8bit: True
|
| 116 |
- adapter: lora
|
| 117 |
+
- micro_batch_size: 1
|
| 118 |
+
- gradient_accumulation_steps: 1
|
| 119 |
+
- warmup_steps: 10
|
| 120 |
- optimizer: adamw_torch
|
| 121 |
+
- flash_attention: true
|
| 122 |
+
- sample_packing: true
|
| 123 |
+
- pad_to_sequence_len: true
|
| 124 |
|
| 125 |
+
Using 4 GPUs, the effective global batch size would have been 4.
|
|
|
|
| 126 |
|
| 127 |
+
### Training loss graph
|
| 128 |
+

|