Update README.md
Browse files
README.md
CHANGED
@@ -101,9 +101,9 @@ repetition penalty and low penalty range (about as long as the prior 2 messages)
|
|
101 |
|
102 |
## Training procedure
|
103 |
[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
|
104 |
-
on
|
105 |
|
106 |
-
The A40
|
107 |
|
108 |
The model has been trained as an 8-bit LoRA adapter, and
|
109 |
it's so large because a LoRA rank of 256 was also used. The reasoning was that this
|
@@ -133,4 +133,4 @@ the base Mistral-7B-v0.1 model.
|
|
133 |
For the second pass, the `lora_model_dir` option was used to continue finetuning on the LoRA
|
134 |
adapter obtained from the first pass.
|
135 |
|
136 |
-
Using
|
|
|
101 |
|
102 |
## Training procedure
|
103 |
[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
|
104 |
+
on 2x NVidia A40 GPUs.
|
105 |
|
106 |
+
The A40 GPUs have been graciously provided by [Arc Compute](https://www.arccompute.io/).
|
107 |
|
108 |
The model has been trained as an 8-bit LoRA adapter, and
|
109 |
it's so large because a LoRA rank of 256 was also used. The reasoning was that this
|
|
|
133 |
For the second pass, the `lora_model_dir` option was used to continue finetuning on the LoRA
|
134 |
adapter obtained from the first pass.
|
135 |
|
136 |
+
Using 2 GPUs, the effective global batch size would have been 128.
|