Trained in 4-bit on pygmalion-6b as POC
Uses the GPTeacher roleplay dataset.
INFO:Getting model ready...
INFO:Prepping for training...
INFO:Creating LoRA model...
INFO:Starting training...
{'loss': 12.5737, 'learning_rate': 0.0002926829268292683, 'epoch': 0.33}
{'loss': 8.5515, 'learning_rate': 0.0002560975609756097, 'epoch': 0.67}
{'loss': 7.5768, 'learning_rate': 0.0002195121951219512, 'epoch': 1.0}
{'loss': 6.9769, 'learning_rate': 0.00018292682926829266, 'epoch': 1.33}
{'loss': 6.6842, 'learning_rate': 0.00014634146341463414, 'epoch': 1.66}
{'loss': 6.3925, 'learning_rate': 0.0001097560975609756, 'epoch': 2.0}
{'loss': 6.041, 'learning_rate': 7.317073170731707e-05, 'epoch': 2.33}
{'loss': 5.6818, 'learning_rate': 3.6585365853658535e-05, 'epoch': 2.66}
{'loss': 5.4639, 'learning_rate': 0.0, 'epoch': 2.99}
{'train_runtime': 960.7748, 'train_samples_per_second': 6.005, 'train_steps_per_second': 0.047, 'train_loss': 7.326934729682074, 'epoch': 2.99}
INFO:LoRA training run is completed and saved.
INFO:Training complete!
I used the electricity so might as well post it.