qlora-mistral-hackatone-yandexq

This model is a fine-tuned version of TheBloke/Mistral-7B-Instruct-v0.2-GPTQ on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.8327

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2
num_epochs: 60
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
2.0167	1.0	1	1.9699
2.0949	2.0	2	1.9681
2.0703	3.0	3	1.9624
2.0674	4.0	4	1.9563
2.0057	5.0	5	1.9500
2.0534	6.0	6	1.9431
1.9912	7.0	7	1.9359
2.0333	8.0	8	1.9285
1.9934	9.0	9	1.9210
2.0358	10.0	10	1.9136
1.9727	11.0	11	1.9064
1.9698	12.0	12	1.8994
1.9983	13.0	13	1.8928
1.981	14.0	14	1.8865
1.9554	15.0	15	1.8807
1.935	16.0	16	1.8755
1.9203	17.0	17	1.8705
1.9371	18.0	18	1.8663
1.9184	19.0	19	1.8625
1.938	20.0	20	1.8592
1.94	21.0	21	1.8565
1.9062	22.0	22	1.8542
1.9293	23.0	23	1.8520
1.9464	24.0	24	1.8503
1.9271	25.0	25	1.8488
1.8998	26.0	26	1.8473
1.9393	27.0	27	1.8461
1.9188	28.0	28	1.8449
1.9117	29.0	29	1.8438
1.8974	30.0	30	1.8428
1.9181	31.0	31	1.8418
1.9047	32.0	32	1.8409
1.8977	33.0	33	1.8400
1.8937	34.0	34	1.8392
1.8801	35.0	35	1.8385
1.9149	36.0	36	1.8377
1.9027	37.0	37	1.8372
1.9076	38.0	38	1.8366
1.8718	39.0	39	1.8362
1.9125	40.0	40	1.8357
1.8903	41.0	41	1.8353
1.8668	42.0	42	1.8350
1.8653	43.0	43	1.8347
1.9068	44.0	44	1.8345
1.869	45.0	45	1.8342
1.8844	46.0	46	1.8340
1.9001	47.0	47	1.8338
1.886	48.0	48	1.8336
1.8847	49.0	49	1.8335
1.8566	50.0	50	1.8333
1.8729	51.0	51	1.8332
1.8736	52.0	52	1.8330
1.9098	53.0	53	1.8330
1.897	54.0	54	1.8329
1.8966	55.0	55	1.8328
1.8942	56.0	56	1.8328
1.871	57.0	57	1.8328
1.8434	58.0	58	1.8327
1.8743	59.0	59	1.8327
1.8472	60.0	60	1.8327

Framework versions

PEFT 0.10.0
Transformers 4.38.2
Pytorch 2.1.0+cu121
Datasets 2.18.0
Tokenizers 0.15.2

ggeorge
/

qlora-mistral-hackatone-yandexq

qlora-mistral-hackatone-yandexq

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for ggeorge/qlora-mistral-hackatone-yandexq

Evaluation results