Astris
/

Mistral-Adastra-IA3

Model card Files Files and versions Community

Mistral-Adastra-IA3 / README.md

Astris's picture

Update README.md

fd7abf5 almost 2 years ago

|

history blame contribute delete

1.06 kB

	---
	license: unknown
	Batch Size: 1
	Gradient Accumulation Steps: 8
	Cutoff Length: 512
	Epochs: 1
	Learning rate: 1e-2
	Optimizer: Adafactor
	LR Scheduler: Linear
	Warmup Steps: 64
	Projections: Q, K, V, O, up, gate, down

	---
	More of a proof of concept. Temper your expectations.

	Warning: May generate 18+ content.
	Trained on the dialogue from Adastra (the furry visual novel) in text-generation-webui (by oobabooga), with a modified Training_PRO extension.
	Model was loaded unquantized (BF16).
	Currently, loading IA3's works in unmodified textgen-webui (I think... Load it as you would a LoRA) only if you load the model unquantized, and not in 4 or 8 bit.

	Other Training parameters:

	Add overlapping blocks: On

	DEMENTOR (long form learning by FP): On //This might be the secret sauce that makes this IA3 so effective with just 1 epoch of training.

	Extra:

	Training took 9 minutes on an RTX 3090

	If you are the creator of Adastra and would like this taken down, please contact me.

	I do not claim to have produced the training data that went into this finetune.