Update README.md
Browse files
README.md
CHANGED
@@ -2,10 +2,10 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
-
# LimaRP-Mistral-7B-v0.1 (Alpaca
|
6 |
|
7 |
This is a version of LimaRP for [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) with
|
8 |
-
about
|
9 |
|
10 |
For more details about LimaRP, see the model page for the [previously released v2 version for Llama-2](https://huggingface.co/lemonilia/limarp-llama2-v2).
|
11 |
Most details written there apply for this version as well. Generally speaking, LimaRP is a longform-oriented, novel-style
|
@@ -13,11 +13,6 @@ roleplaying chat model intended to replicate the experience of 1-on-1 roleplay o
|
|
13 |
IRC/Discord-style RP (aka "Markdown format") is not supported yet. The model does not include instruction tuning,
|
14 |
only manually picked and slightly edited RP conversations with persona and scenario data.
|
15 |
|
16 |
-
## Known issues
|
17 |
-
- Despite performing a few finetuning attempts, including one that followed almost the same procedure as in previous releases,
|
18 |
-
Mistral-7B-v0.1 appears to have strange repetition issues.
|
19 |
-
- Even though benchmarks tell a different story, in practice the model doesn't feel smarter during roleplay than Llama-2-13B.
|
20 |
-
|
21 |
## Prompt format
|
22 |
Same as before. It uses the [extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
|
23 |
with `### Input:` immediately preceding user inputs and `### Response:` immediately preceding
|
@@ -83,39 +78,33 @@ the modifier, the model will choose the most appropriate response length** (alth
|
|
83 |
not necessarily be what the user desires).
|
84 |
|
85 |
## Suggested settings
|
86 |
-
You can follow these instruction format settings in SillyTavern. Replace `
|
87 |
your desired response length:
|
88 |
|
89 |
-
![settings](https://files.catbox.moe/
|
90 |
|
91 |
## Text generation settings
|
92 |
-
|
93 |
-
repetition penalty and low penalty range (about as long as the prior 2 messages) appears to help:
|
94 |
|
95 |
-
- TFS = 0.
|
96 |
-
- Temperature = 0.
|
97 |
-
- Repetition penalty = ~1.
|
98 |
-
- Repetition penalty range = ~
|
99 |
- top-k = 0 (disabled)
|
100 |
- top-p = 1 (disabled)
|
101 |
|
102 |
## Training procedure
|
103 |
[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
|
104 |
-
on
|
105 |
|
106 |
The A40 GPUs have been graciously provided by [Arc Compute](https://www.arccompute.io/).
|
107 |
|
108 |
-
The model has been trained as an 8-bit LoRA adapter, and
|
109 |
-
it's so large because a LoRA rank of 256 was also used. The reasoning was that this
|
110 |
-
might have helped the model internalize any newly acquired information, making the
|
111 |
-
training process closer to a full finetune. It's suggested to merge the adapter to
|
112 |
-
the base Mistral-7B-v0.1 model.
|
113 |
-
|
114 |
### Training hyperparameters
|
115 |
-
- learning_rate: 0.
|
116 |
-
-
|
|
|
117 |
- num_epochs: 2
|
118 |
-
- sequence_len:
|
119 |
- lora_r: 256
|
120 |
- lora_alpha: 16
|
121 |
- lora_dropout: 0.05
|
@@ -125,12 +114,15 @@ the base Mistral-7B-v0.1 model.
|
|
125 |
- tf32: True
|
126 |
- load_in_8bit: True
|
127 |
- adapter: lora
|
128 |
-
- micro_batch_size:
|
129 |
-
- gradient_accumulation_steps:
|
130 |
-
- warmup_steps:
|
131 |
- optimizer: adamw_torch
|
|
|
|
|
|
|
132 |
|
133 |
-
|
134 |
-
adapter obtained from the first pass.
|
135 |
|
136 |
-
|
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
+
# LimaRP-Mistral-7B-v0.1 (Alpaca)
|
6 |
|
7 |
This is a version of LimaRP for [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) with
|
8 |
+
about 2000 training samples _up to_ 9k tokens length
|
9 |
|
10 |
For more details about LimaRP, see the model page for the [previously released v2 version for Llama-2](https://huggingface.co/lemonilia/limarp-llama2-v2).
|
11 |
Most details written there apply for this version as well. Generally speaking, LimaRP is a longform-oriented, novel-style
|
|
|
13 |
IRC/Discord-style RP (aka "Markdown format") is not supported yet. The model does not include instruction tuning,
|
14 |
only manually picked and slightly edited RP conversations with persona and scenario data.
|
15 |
|
|
|
|
|
|
|
|
|
|
|
16 |
## Prompt format
|
17 |
Same as before. It uses the [extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
|
18 |
with `### Input:` immediately preceding user inputs and `### Response:` immediately preceding
|
|
|
78 |
not necessarily be what the user desires).
|
79 |
|
80 |
## Suggested settings
|
81 |
+
You can follow these instruction format settings in SillyTavern. Replace `medium` with
|
82 |
your desired response length:
|
83 |
|
84 |
+
![settings](https://files.catbox.moe/24c1w0.png)
|
85 |
|
86 |
## Text generation settings
|
87 |
+
These settings could be a good general starting point:
|
|
|
88 |
|
89 |
+
- TFS = 0.92
|
90 |
+
- Temperature = 0.70
|
91 |
+
- Repetition penalty = ~1.1
|
92 |
+
- Repetition penalty range = ~2048
|
93 |
- top-k = 0 (disabled)
|
94 |
- top-p = 1 (disabled)
|
95 |
|
96 |
## Training procedure
|
97 |
[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
|
98 |
+
on 4x NVidia A40 GPUs.
|
99 |
|
100 |
The A40 GPUs have been graciously provided by [Arc Compute](https://www.arccompute.io/).
|
101 |
|
|
|
|
|
|
|
|
|
|
|
|
|
102 |
### Training hyperparameters
|
103 |
+
- learning_rate: 0.0003
|
104 |
+
- lr_scheduler: constant_with_warmup
|
105 |
+
- noisy_embedding_alpha: 5
|
106 |
- num_epochs: 2
|
107 |
+
- sequence_len: 8750
|
108 |
- lora_r: 256
|
109 |
- lora_alpha: 16
|
110 |
- lora_dropout: 0.05
|
|
|
114 |
- tf32: True
|
115 |
- load_in_8bit: True
|
116 |
- adapter: lora
|
117 |
+
- micro_batch_size: 1
|
118 |
+
- gradient_accumulation_steps: 1
|
119 |
+
- warmup_steps: 10
|
120 |
- optimizer: adamw_torch
|
121 |
+
- flash_attention: true
|
122 |
+
- sample_packing: true
|
123 |
+
- pad_to_sequence_len: true
|
124 |
|
125 |
+
Using 4 GPUs, the effective global batch size would have been 4.
|
|
|
126 |
|
127 |
+
### Training loss graph
|
128 |
+
![Train loss](https://files.catbox.moe/0pj84w.png)
|