Astris
/

Mistral-Adastra-IA3

Model card Files Files and versions Community

Astris commited on Oct 4, 2023

Commit

9d5af30

·

1 Parent(s): dcd4153

Update README.md

Files changed (1) hide show

README.md +22 -0

README.md CHANGED Viewed

@@ -1,3 +1,25 @@
 ---
 license: unknown
 ---

 ---
 license: unknown
 ---
+Warning: May generate 18+ content
+Trained on the dialogue from Adastra (the furry visual novel) in text-generation-webui (by oobabooga), with a modified Training_PRO extension.
+Model was loaded unquantized (BF16).
+Currently, loading IA3's works in unmodified textgen-webui (I think... Load it as you would a LoRA) only if you load the model unquantized, and not
+in 4 or 8 bit.
+Training parameters:
+Batch Size: 1
+Gradient Accumulation Steps: 8
+Cutoff Length: 512 //Didn't have enough memory for more.
+Epochs: 1
+Learning Rate: 1e-2 //yes, unusually high. maybe a quirk of IA3's?
+LR Scheduler: Linear
+Warmup Steps: 64
+Targets: all (Q, K, V, O, up, down, gate)
+Optimizer: Adafactor
+Add overlapping blocks: On
+DEMENTOR (long form learning by FP): On //This might be the secret sauce that makes this IA3 so effective with just 1 epoch of training.
+Training took 9 minutes on an RTX 3090
+If you are the creator of Adastra and would like this taken down, please contact me.
+I do not claim to have produced the training data that went into this finetune.