--- license: unknown --- Warning: May generate 18+ content Trained on the dialogue from Adastra (the furry visual novel) in text-generation-webui (by oobabooga), with a modified Training_PRO extension. Model was loaded unquantized (BF16). Currently, loading IA3's works in unmodified textgen-webui (I think... Load it as you would a LoRA) only if you load the model unquantized, and not in 4 or 8 bit. Training parameters: Batch Size: 1 Gradient Accumulation Steps: 8 Cutoff Length: 512 //Didn't have enough memory for more. Epochs: 1 Learning Rate: 1e-2 //yes, unusually high. maybe a quirk of IA3's? LR Scheduler: Linear Warmup Steps: 64 Targets: all (Q, K, V, O, up, down, gate) Optimizer: Adafactor Add overlapping blocks: On DEMENTOR (long form learning by FP): On //This might be the secret sauce that makes this IA3 so effective with just 1 epoch of training. Training took 9 minutes on an RTX 3090 If you are the creator of Adastra and would like this taken down, please contact me. I do not claim to have produced the training data that went into this finetune.