Astris commited on
Commit
fcf8ece
·
1 Parent(s): 9d5af30

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -12
README.md CHANGED
@@ -1,25 +1,31 @@
1
  ---
2
  license: unknown
3
- ---
4
- Warning: May generate 18+ content
5
- Trained on the dialogue from Adastra (the furry visual novel) in text-generation-webui (by oobabooga), with a modified Training_PRO extension.
6
- Model was loaded unquantized (BF16).
7
- Currently, loading IA3's works in unmodified textgen-webui (I think... Load it as you would a LoRA) only if you load the model unquantized, and not
8
- in 4 or 8 bit.
9
-
10
- Training parameters:
11
  Batch Size: 1
12
  Gradient Accumulation Steps: 8
13
- Cutoff Length: 512 //Didn't have enough memory for more.
14
  Epochs: 1
15
- Learning Rate: 1e-2 //yes, unusually high. maybe a quirk of IA3's?
 
16
  LR Scheduler: Linear
17
  Warmup Steps: 64
18
- Targets: all (Q, K, V, O, up, down, gate)
19
- Optimizer: Adafactor
 
 
 
 
 
 
 
 
20
  Add overlapping blocks: On
 
21
  DEMENTOR (long form learning by FP): On //This might be the secret sauce that makes this IA3 so effective with just 1 epoch of training.
22
 
 
 
23
  Training took 9 minutes on an RTX 3090
 
24
  If you are the creator of Adastra and would like this taken down, please contact me.
 
25
  I do not claim to have produced the training data that went into this finetune.
 
1
  ---
2
  license: unknown
 
 
 
 
 
 
 
 
3
  Batch Size: 1
4
  Gradient Accumulation Steps: 8
5
+ Cutoff Length: 512
6
  Epochs: 1
7
+ Learning rate: 1e-2
8
+ Optimizer: Adafactor
9
  LR Scheduler: Linear
10
  Warmup Steps: 64
11
+ Projections: Q, K, V, O, up, gate, down
12
+
13
+ ---
14
+ Warning: May generate 18+ content.
15
+ Trained on the dialogue from Adastra (the furry visual novel) in text-generation-webui (by oobabooga), with a modified Training_PRO extension.
16
+ Model was loaded unquantized (BF16).
17
+ Currently, loading IA3's works in unmodified textgen-webui (I think... Load it as you would a LoRA) only if you load the model unquantized, and not in 4 or 8 bit.
18
+
19
+ Other Training parameters:
20
+
21
  Add overlapping blocks: On
22
+
23
  DEMENTOR (long form learning by FP): On //This might be the secret sauce that makes this IA3 so effective with just 1 epoch of training.
24
 
25
+ Extra:
26
+
27
  Training took 9 minutes on an RTX 3090
28
+
29
  If you are the creator of Adastra and would like this taken down, please contact me.
30
+
31
  I do not claim to have produced the training data that went into this finetune.