Astris commited on
Commit
9d5af30
·
1 Parent(s): dcd4153

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -0
README.md CHANGED
@@ -1,3 +1,25 @@
1
  ---
2
  license: unknown
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: unknown
3
  ---
4
+ Warning: May generate 18+ content
5
+ Trained on the dialogue from Adastra (the furry visual novel) in text-generation-webui (by oobabooga), with a modified Training_PRO extension.
6
+ Model was loaded unquantized (BF16).
7
+ Currently, loading IA3's works in unmodified textgen-webui (I think... Load it as you would a LoRA) only if you load the model unquantized, and not
8
+ in 4 or 8 bit.
9
+
10
+ Training parameters:
11
+ Batch Size: 1
12
+ Gradient Accumulation Steps: 8
13
+ Cutoff Length: 512 //Didn't have enough memory for more.
14
+ Epochs: 1
15
+ Learning Rate: 1e-2 //yes, unusually high. maybe a quirk of IA3's?
16
+ LR Scheduler: Linear
17
+ Warmup Steps: 64
18
+ Targets: all (Q, K, V, O, up, down, gate)
19
+ Optimizer: Adafactor
20
+ Add overlapping blocks: On
21
+ DEMENTOR (long form learning by FP): On //This might be the secret sauce that makes this IA3 so effective with just 1 epoch of training.
22
+
23
+ Training took 9 minutes on an RTX 3090
24
+ If you are the creator of Adastra and would like this taken down, please contact me.
25
+ I do not claim to have produced the training data that went into this finetune.