omg it almost works!

I stripped out the 5 least used layers. and then I used SFT over 4 epochs and a high learning rate.... and its almost good!

My goal is to make a new Velvet eclipse with these "less used" paramets stripped out. reducing the size significantly to allow for a higher inference speed, and more room for context.

NOTES

        per_device_train_batch_size = 10,
        gradient_accumulation_steps = 4,
        num_train_epochs = 4, # Set this for 1 full training run.
        learning_rate = 5e-4, # Reduce to 2e-5 for long training runs

Uploaded finetuned model

  • Developed by: SuperbEmphasis
  • License: apache-2.0
  • Finetuned from model : SuperbEmphasis/The-Omega-Directive-12B-EVISCERATED

This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
12
Safetensors
Model size
10.6B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SuperbEmphasis/The-Omega-Directive-12B-EVISCERATED-FT

Finetuned
(1)
this model
Finetunes
1 model
Quantizations
1 model