Update README.md
Browse files
README.md
CHANGED
@@ -1,25 +1,31 @@
|
|
1 |
---
|
2 |
license: unknown
|
3 |
-
---
|
4 |
-
Warning: May generate 18+ content
|
5 |
-
Trained on the dialogue from Adastra (the furry visual novel) in text-generation-webui (by oobabooga), with a modified Training_PRO extension.
|
6 |
-
Model was loaded unquantized (BF16).
|
7 |
-
Currently, loading IA3's works in unmodified textgen-webui (I think... Load it as you would a LoRA) only if you load the model unquantized, and not
|
8 |
-
in 4 or 8 bit.
|
9 |
-
|
10 |
-
Training parameters:
|
11 |
Batch Size: 1
|
12 |
Gradient Accumulation Steps: 8
|
13 |
-
Cutoff Length: 512
|
14 |
Epochs: 1
|
15 |
-
Learning
|
|
|
16 |
LR Scheduler: Linear
|
17 |
Warmup Steps: 64
|
18 |
-
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
Add overlapping blocks: On
|
|
|
21 |
DEMENTOR (long form learning by FP): On //This might be the secret sauce that makes this IA3 so effective with just 1 epoch of training.
|
22 |
|
|
|
|
|
23 |
Training took 9 minutes on an RTX 3090
|
|
|
24 |
If you are the creator of Adastra and would like this taken down, please contact me.
|
|
|
25 |
I do not claim to have produced the training data that went into this finetune.
|
|
|
1 |
---
|
2 |
license: unknown
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
Batch Size: 1
|
4 |
Gradient Accumulation Steps: 8
|
5 |
+
Cutoff Length: 512
|
6 |
Epochs: 1
|
7 |
+
Learning rate: 1e-2
|
8 |
+
Optimizer: Adafactor
|
9 |
LR Scheduler: Linear
|
10 |
Warmup Steps: 64
|
11 |
+
Projections: Q, K, V, O, up, gate, down
|
12 |
+
|
13 |
+
---
|
14 |
+
Warning: May generate 18+ content.
|
15 |
+
Trained on the dialogue from Adastra (the furry visual novel) in text-generation-webui (by oobabooga), with a modified Training_PRO extension.
|
16 |
+
Model was loaded unquantized (BF16).
|
17 |
+
Currently, loading IA3's works in unmodified textgen-webui (I think... Load it as you would a LoRA) only if you load the model unquantized, and not in 4 or 8 bit.
|
18 |
+
|
19 |
+
Other Training parameters:
|
20 |
+
|
21 |
Add overlapping blocks: On
|
22 |
+
|
23 |
DEMENTOR (long form learning by FP): On //This might be the secret sauce that makes this IA3 so effective with just 1 epoch of training.
|
24 |
|
25 |
+
Extra:
|
26 |
+
|
27 |
Training took 9 minutes on an RTX 3090
|
28 |
+
|
29 |
If you are the creator of Adastra and would like this taken down, please contact me.
|
30 |
+
|
31 |
I do not claim to have produced the training data that went into this finetune.
|