lemonilia commited on
Commit
4d76b14
1 Parent(s): bcf1597

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -11
README.md CHANGED
@@ -4,11 +4,10 @@ license: apache-2.0
4
 
5
  # LimaRP-Mistral-7B-v0.1 (Alpaca, 8-bit LoRA adapter)
6
 
7
- This is an experimental version of LimaRP for [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) with
8
- about 1800 training samples _up to_ 4k tokens length. Contrarily to the previously released "v3" version for Llama-2, this one does
9
- not include a preliminary finetuning pass on several thousands short stories. Initial testing has shown Mistral to be capable of
10
- generating on its own the kind of stories that were included there; its training data appears to be quite diverse and does not
11
- seem to have been filtered for content type.
12
 
13
  **Due to software limitations, finetuning didn't take advantage yet of the Sliding Window Attention (SWA) which would have allowed
14
  to use longer conversations in the training data. Thus, this version of LimaRP should be considered an _initial finetuning attempt_ and
@@ -100,7 +99,7 @@ generation settings may be:
100
 
101
  ## Training procedure
102
  [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
103
- on a 2x NVidia A40 GPU cluster.
104
 
105
  The A40 GPU cluster has been graciously provided by [Arc Compute](https://www.arccompute.io/).
106
 
@@ -111,19 +110,25 @@ training process closer to a full finetune. It's suggested to merge the adapter
111
  the base Mistral-7B-v0.1 model.
112
 
113
  ### Training hyperparameters
114
- - learning_rate: 0.001
115
  - lr_scheduler_type: cosine
116
- - num_epochs: 2
 
117
  - lora_r: 256
118
  - lora_alpha: 16
119
  - lora_dropout: 0.05
120
  - lora_target_linear: True
121
  - bf16: True
 
122
  - tf32: True
123
  - load_in_8bit: True
124
  - adapter: lora
125
- - micro_batch_size: 1
126
- - gradient_accumulation_steps: 16
 
127
  - optimizer: adamw_torch
128
 
129
- Using 2 GPUs, the effective global batch size would have been 32.
 
 
 
 
4
 
5
  # LimaRP-Mistral-7B-v0.1 (Alpaca, 8-bit LoRA adapter)
6
 
7
+ This is a version of LimaRP for [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) with
8
+ about 1800 training samples _up to_ 4k tokens length. A 2-pass training procedure has been employed. The first pass includes
9
+ finetuning on about 6800 stories within 4k tokens length and the second pass is LimaRP with changes introducing more effective
10
+ control on response length.
 
11
 
12
  **Due to software limitations, finetuning didn't take advantage yet of the Sliding Window Attention (SWA) which would have allowed
13
  to use longer conversations in the training data. Thus, this version of LimaRP should be considered an _initial finetuning attempt_ and
 
99
 
100
  ## Training procedure
101
  [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
102
+ on a 4x NVidia A40 GPU cluster.
103
 
104
  The A40 GPU cluster has been graciously provided by [Arc Compute](https://www.arccompute.io/).
105
 
 
110
  the base Mistral-7B-v0.1 model.
111
 
112
  ### Training hyperparameters
113
+ - learning_rate: 0.0001
114
  - lr_scheduler_type: cosine
115
+ - num_epochs: 2 (1 for the first pass)
116
+ - sequence_len: 4096
117
  - lora_r: 256
118
  - lora_alpha: 16
119
  - lora_dropout: 0.05
120
  - lora_target_linear: True
121
  - bf16: True
122
+ - fp16: false
123
  - tf32: True
124
  - load_in_8bit: True
125
  - adapter: lora
126
+ - micro_batch_size: 2
127
+ - gradient_accumulation_steps: 1
128
+ - warmup_steps: 40
129
  - optimizer: adamw_torch
130
 
131
+ For the second pass, the `lora_model_dir` option was used to continue finetuning on the LoRA
132
+ adapter obtained from the first pass.
133
+
134
+ Using 2 GPUs, the effective global batch size would have been 8.