lemonilia commited on
Commit
b7e2229
1 Parent(s): 142fec2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +148 -0
README.md CHANGED
@@ -1,3 +1,151 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ # AshhLimaRP-Mistral-7B (Alpaca, v1)
6
+
7
+ This is a version of LimaRP with 2000 training samples _up to_ about 9k tokens length
8
+ finetuned on [Ashhwriter-Mistral-7B](https://huggingface.co/lemonilia/Ashhwriter-Mistral-7B).
9
+
10
+ LimaRP is a longform-oriented, novel-style roleplaying chat model intended to replicate the experience
11
+ of 1-on-1 roleplay on Internet forums. Short-form, IRC/Discord-style RP (aka "Markdown format")
12
+ is not supported. The model does not include instruction tuning, only manually picked and
13
+ slightly edited RP conversations with persona and scenario data.
14
+
15
+ Ashhwriter, the base, is a model entirely finetuned on human-written lewd stories.
16
+
17
+ ## Available versions
18
+ - Float16 HF weights (_to be uploaded_)
19
+ - LoRA Adapter ([adapter_config.json](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/resolve/main/adapter_config.json) and [adapter_model.bin](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/resolve/main/adapter_model.bin))
20
+ - 4bit AWQ (_to be uploaded_)
21
+ - [Q4_K_M GGUF](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/resolve/main/AshhLimaRP-Mistral-7B.Q4_K_M.gguf)
22
+ - [Q6_K GGUF](https://huggingface.co/lemonilia/AshhLimaRP-Mistral-7B/resolve/main/AshhLimaRP-Mistral-7B.Q6_K.gguf)
23
+
24
+ ## Prompt format
25
+ [Extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
26
+ with `### Instruction:`, `### Input:` immediately preceding user inputs and `### Response:`
27
+ immediately preceding model outputs. While Alpaca wasn't originally intended for multi-turn
28
+ responses, in practice this is not a problem; the format follows a pattern already used by
29
+ other models.
30
+
31
+ ```
32
+ ### Instruction:
33
+ Character's Persona: {bot character description}
34
+
35
+ User's Persona: {user character description}
36
+
37
+ Scenario: {what happens in the story}
38
+
39
+ Play the role of Character. You must engage in a roleplaying chat with User below this line. Do not write dialogues and narration for User.
40
+
41
+ ### Input:
42
+ User: {utterance}
43
+
44
+ ### Response:
45
+ Character: {utterance}
46
+
47
+ ### Input
48
+ User: {utterance}
49
+
50
+ ### Response:
51
+ Character: {utterance}
52
+
53
+ (etc.)
54
+ ```
55
+
56
+ You should:
57
+ - Replace all text in curly braces (curly braces included) with your own text.
58
+ - Replace `User` and `Character` with appropriate names.
59
+
60
+
61
+ ### Message length control
62
+ Inspired by the previously named "Roleplay" preset in SillyTavern, with this
63
+ version of LimaRP it is possible to append a length modifier to the response instruction
64
+ sequence, like this:
65
+
66
+ ```
67
+ ### Input
68
+ User: {utterance}
69
+
70
+ ### Response: (length = medium)
71
+ Character: {utterance}
72
+ ```
73
+
74
+ This has an immediately noticeable effect on bot responses. The lengths using during training are:
75
+ `micro`, `tiny`, `short`, `medium`, `long`, `massive`, `huge`, `enormous`, `humongous`, `unlimited`.
76
+ **The recommended starting length is medium**. Keep in mind that the AI can ramble or impersonate
77
+ the user with very long messages.
78
+
79
+ The length control effect is reproducible, but the messages will not necessarily follow
80
+ lengths very precisely, rather follow certain ranges on average, as seen in this table
81
+ with data from tests made with one reply at the beginning of the conversation:
82
+
83
+ ![lengths](https://i.imgur.com/2WXGgaV.png)
84
+
85
+ Response length control appears to work well also deep into the conversation. **By omitting
86
+ the modifier, the model will choose the most appropriate response length** (although it might
87
+ not necessarily be what the user desires).
88
+
89
+ ## Suggested settings
90
+ You can follow these instruction format settings in SillyTavern. Replace `medium` with
91
+ your desired response length:
92
+
93
+ ![settings](https://files.catbox.moe/fpieug.png)
94
+
95
+ ## Text generation settings
96
+ These settings could be a good general starting point:
97
+
98
+ - TFS = 0.90
99
+ - Temperature = 0.70
100
+ - Repetition penalty = ~1.11
101
+ - Repetition penalty range = ~2048
102
+ - top-k = 0 (disabled)
103
+ - top-p = 1 (disabled)
104
+
105
+ ## Training procedure
106
+ [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
107
+ on 2x NVidia A40 GPUs.
108
+
109
+ The A40 GPUs have been graciously provided by [Arc Compute](https://www.arccompute.io/).
110
+
111
+ ### Training hyperparameters
112
+ A lower learning rate than usual was employed. Due to an unforeseen issue the training
113
+ was cut short and as a result 3 epochs were trained instead of the planned 4. Using 2 GPUs,
114
+ the effective global batch size would have been 16.
115
+
116
+ Training was continued from the most recent LoRA adapter from Ashhwriter, using the same
117
+ LoRA R and LoRA alpha.
118
+
119
+ - lora_model_dir: /home/anon/bin/axolotl/OUT_mistral-stories/checkpoint-6000/
120
+ - learning_rate: 0.00005
121
+ - lr_scheduler: cosine
122
+ - noisy_embedding_alpha: 3.5
123
+ - num_epochs: 4
124
+ - sequence_len: 8750
125
+ - lora_r: 256
126
+ - lora_alpha: 16
127
+ - lora_dropout: 0.05
128
+ - lora_target_linear: True
129
+ - bf16: True
130
+ - fp16: false
131
+ - tf32: True
132
+ - load_in_8bit: True
133
+ - adapter: lora
134
+ - micro_batch_size: 2
135
+ - optimizer: adamw_bnb_8bit
136
+ - warmup_steps: 10
137
+ - optimizer: adamw_torch
138
+ - flash_attention: true
139
+ - sample_packing: true
140
+ - pad_to_sequence_len: true
141
+
142
+
143
+ ### Loss graphs
144
+ Values are higher than typical because the training is performed on the entire
145
+ sample, similar to unsupervised finetuning.
146
+
147
+ #### Train loss
148
+ ![Train loss](https://files.catbox.moe/ovw8c7.png)
149
+
150
+ #### Eval loss
151
+ ![Eval loss](https://files.catbox.moe/yp7o0h.png)