lemonilia commited on
Commit
f5eb99d
1 Parent(s): 73d8726

Upload 3 files

Browse files
README.md CHANGED
@@ -1,3 +1,115 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ # LimaRP-Llama2-7B-v3 (Alpaca, experimental, 8-bit LoRA adapter)
6
+
7
+ This is an experimental version of LimaRP for Llama2 with an updated dataset (1800 training samples)
8
+ and a 2-pass training procedure. The first pass includes unsupervised finetuning on about 6800 stories within
9
+ 4k tokens length and the second pass is LimaRP with changes introducing more effective control on response length.
10
+
11
+ For more details about LimaRP, see the model page for the [previously released version](https://huggingface.co/lemonilia/limarp-llama2-v2).
12
+ Most details written there apply for this version as well.
13
+
14
+ ## Prompt format
15
+ Same as before. It uses the [extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
16
+ with `### Input:` immediately preceding user inputs and `### Response:` immediately preceding
17
+ model outputs. While Alpaca wasn't originally intended for multi-turn responses, in practice this
18
+ is not a problem; the format follows a pattern already used by other models.
19
+
20
+ ```
21
+ ### Instruction:
22
+ Character's Persona: {bot character description}
23
+
24
+ User's Persona: {user character description}
25
+
26
+ Scenario: {what happens in the story}
27
+
28
+ Play the role of Character. You must engage in a roleplaying chat with User below this line. Do not write dialogues and narration for User.
29
+
30
+ ### Input:
31
+ User: {utterance}
32
+
33
+ ### Response:
34
+ Character: {utterance}
35
+
36
+ ### Input
37
+ User: {utterance}
38
+
39
+ ### Response:
40
+ Character: {utterance}
41
+
42
+ (etc.)
43
+ ```
44
+
45
+ You should:
46
+ - Replace all text in curly braces (curly braces included) with your own text.
47
+ - Replace `User` and `Character` with appropriate names.
48
+
49
+
50
+ ### Message length control
51
+ Inspired by the previously named "Roleplay" preset in SillyTavern, starting from this
52
+ version of LimaRP it is possible to append a length modifier to the response instruction
53
+ sequence, like this:
54
+
55
+ ```
56
+ ### Input
57
+ User: {utterance}
58
+
59
+ ### Response: (length = medium)
60
+ Character: {utterance}
61
+ ```
62
+
63
+ This has an immediately noticeable effect on bot responses. The available lengths are:
64
+ `tiny`, `short`, `medium`, `long`, `huge`, `humongous`, `extreme`, `unlimited`. **The
65
+ recommended starting length is `medium`**. Keep in mind that the AI may ramble
66
+ or impersonate the user with very long messages.
67
+
68
+ The length control effect is reproducible, but the messages will not necessarily follow
69
+ lengths very precisely, rather follow certain ranges on average, as seen in this table
70
+ with data from tests made with one reply at the beginning of the conversation:
71
+
72
+ ![lengths](https://files.catbox.moe/dy39bt.png)
73
+
74
+ Response length control appears to work well also deep into the conversation.
75
+
76
+ ## Suggested settings
77
+ You can follow these instruction format settings in SillyTavern. Replace `tiny` with
78
+ your desired response length:
79
+
80
+ ![settings](https://files.catbox.moe/6lcz0u.png)
81
+
82
+ ## Training procedure
83
+ [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
84
+ on a 4x NVidia A40 GPU cluster. The model has been trained as an 8-bit LoRA adapter, and
85
+ it's so large because a LoRA rank of 256 was also used. The reasoning was that this
86
+ might have helped the model internalize any newly acquired information, making the
87
+ training process closer to a full finetune.
88
+
89
+ It's suggested to merge the adapter to the base Llama2-7B model (or other Llama2-based
90
+ models).
91
+
92
+ ### Training hyperparameters
93
+ For the first pass these settings were used:
94
+
95
+ - learning_rate: 0.00065
96
+ - lr_scheduler_type: constant
97
+ - lora_r: 256
98
+ - lora_alpha: 16
99
+ - lora_dropout: 0.05
100
+ - lora_target_linear: True
101
+ - num_epochs: 1
102
+ - bf16: True
103
+ - tf32: True
104
+ - load_in_8bit: True
105
+ - adapter: lora
106
+ - micro_batch_size: 2
107
+ - gradient_accumulation_steps: 1
108
+ - optimizer: adamw_torch
109
+
110
+ In the second pass, the `lora_model_dir` option was used to load and train the adapter
111
+ previously trained on a stories dataset. These settings were also changed:
112
+
113
+ - lora_dropout: 0.0
114
+
115
+ Using 4 GPUs, the effective global batch size would have been 8.
SillyTavern_LimaRP-Alpaca-context-basic.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "story_string": "{{#if system}}{{system}}{{/if}}{{#if description}}{{description}}{{/if}}",
3
+ "chat_start": "",
4
+ "example_separator": "",
5
+ "name": "LimaRP-Alpaca"
6
+ }
SillyTavern_LimaRP-Alpaca-instruct.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "wrap": true,
3
+ "names": true,
4
+ "system_prompt": "",
5
+ "system_sequence": "### Instruction:",
6
+ "stop_sequence": "### Input:",
7
+ "input_sequence": "\n### Input:",
8
+ "output_sequence": "\n### Response:",
9
+ "separator_sequence": "",
10
+ "macro": true,
11
+ "names_force_groups": false,
12
+ "last_output_sequence": "\n### Response: (length = medium)",
13
+ "activation_regex": "",
14
+ "system_sequence_prefix": "### Instruction:",
15
+ "system_sequence_suffix": "",
16
+ "first_output_sequence": "",
17
+ "name": "LimaRP-Alpaca"
18
+ }