Astris
/

Mistral-Adastra-IA3

Model card Files Files and versions Community

Astris commited on Oct 4, 2023

Commit

fcf8ece

·

1 Parent(s): 9d5af30

Update README.md

Files changed (1) hide show

README.md +18 -12

README.md CHANGED Viewed

@@ -1,25 +1,31 @@
 ---
 license: unknown
----
-Warning: May generate 18+ content
-Trained on the dialogue from Adastra (the furry visual novel) in text-generation-webui (by oobabooga), with a modified Training_PRO extension.
-Model was loaded unquantized (BF16).
-Currently, loading IA3's works in unmodified textgen-webui (I think... Load it as you would a LoRA) only if you load the model unquantized, and not
-in 4 or 8 bit.
-Training parameters:
 Batch Size: 1
 Gradient Accumulation Steps: 8
-Cutoff Length: 512 //Didn't have enough memory for more.
 Epochs: 1
-Learning Rate: 1e-2 //yes, unusually high. maybe a quirk of IA3's?
 LR Scheduler: Linear
 Warmup Steps: 64
-Targets: all (Q, K, V, O, up, down, gate)
-Optimizer: Adafactor
 Add overlapping blocks: On
 DEMENTOR (long form learning by FP): On //This might be the secret sauce that makes this IA3 so effective with just 1 epoch of training.
 Training took 9 minutes on an RTX 3090
 If you are the creator of Adastra and would like this taken down, please contact me.
 I do not claim to have produced the training data that went into this finetune.

 ---
 license: unknown
 Batch Size: 1
 Gradient Accumulation Steps: 8
+Cutoff Length: 512
 Epochs: 1
+Learning rate: 1e-2
+Optimizer: Adafactor
 LR Scheduler: Linear
 Warmup Steps: 64
+Projections: Q, K, V, O, up, gate, down
+---
+Warning: May generate 18+ content.
+Trained on the dialogue from Adastra (the furry visual novel) in text-generation-webui (by oobabooga), with a modified Training_PRO extension.
+Model was loaded unquantized (BF16).
+Currently, loading IA3's works in unmodified textgen-webui (I think... Load it as you would a LoRA) only if you load the model unquantized, and not in 4 or 8 bit.
+Other Training parameters:
 Add overlapping blocks: On
 DEMENTOR (long form learning by FP): On //This might be the secret sauce that makes this IA3 so effective with just 1 epoch of training.
+Extra:
 Training took 9 minutes on an RTX 3090
 If you are the creator of Adastra and would like this taken down, please contact me.
 I do not claim to have produced the training data that went into this finetune.