|
--- |
|
license: unknown |
|
Batch Size: 1 |
|
Gradient Accumulation Steps: 8 |
|
Cutoff Length: 512 |
|
Epochs: 1 |
|
Learning rate: 1e-2 |
|
Optimizer: Adafactor |
|
LR Scheduler: Linear |
|
Warmup Steps: 64 |
|
Projections: Q, K, V, O, up, gate, down |
|
|
|
--- |
|
More of a proof of concept. Temper your expectations. |
|
|
|
Warning: May generate 18+ content. |
|
Trained on the dialogue from Adastra (the furry visual novel) in text-generation-webui (by oobabooga), with a modified Training_PRO extension. |
|
Model was loaded unquantized (BF16). |
|
Currently, loading IA3's works in unmodified textgen-webui (I think... Load it as you would a LoRA) only if you load the model unquantized, and not in 4 or 8 bit. |
|
|
|
Other Training parameters: |
|
|
|
Add overlapping blocks: On |
|
|
|
DEMENTOR (long form learning by FP): On //This might be the secret sauce that makes this IA3 so effective with just 1 epoch of training. |
|
|
|
Extra: |
|
|
|
Training took 9 minutes on an RTX 3090 |
|
|
|
If you are the creator of Adastra and would like this taken down, please contact me. |
|
|
|
I do not claim to have produced the training data that went into this finetune. |