Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,25 @@
|
|
1 |
---
|
2 |
license: unknown
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: unknown
|
3 |
---
|
4 |
+
Warning: May generate 18+ content
|
5 |
+
Trained on the dialogue from Adastra (the furry visual novel) in text-generation-webui (by oobabooga), with a modified Training_PRO extension.
|
6 |
+
Model was loaded unquantized (BF16).
|
7 |
+
Currently, loading IA3's works in unmodified textgen-webui (I think... Load it as you would a LoRA) only if you load the model unquantized, and not
|
8 |
+
in 4 or 8 bit.
|
9 |
+
|
10 |
+
Training parameters:
|
11 |
+
Batch Size: 1
|
12 |
+
Gradient Accumulation Steps: 8
|
13 |
+
Cutoff Length: 512 //Didn't have enough memory for more.
|
14 |
+
Epochs: 1
|
15 |
+
Learning Rate: 1e-2 //yes, unusually high. maybe a quirk of IA3's?
|
16 |
+
LR Scheduler: Linear
|
17 |
+
Warmup Steps: 64
|
18 |
+
Targets: all (Q, K, V, O, up, down, gate)
|
19 |
+
Optimizer: Adafactor
|
20 |
+
Add overlapping blocks: On
|
21 |
+
DEMENTOR (long form learning by FP): On //This might be the secret sauce that makes this IA3 so effective with just 1 epoch of training.
|
22 |
+
|
23 |
+
Training took 9 minutes on an RTX 3090
|
24 |
+
If you are the creator of Adastra and would like this taken down, please contact me.
|
25 |
+
I do not claim to have produced the training data that went into this finetune.
|