rapper-gpt

This model is a fine-tuned version of TheBloke/Mistral-7B-Instruct-v0.2-GPTQ on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1394

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2
num_epochs: 25
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
1.0259	0.8889	4	0.9759
0.7299	2.0	9	0.9117
0.8055	2.8889	13	0.8819
0.5669	4.0	18	0.8468
0.622	4.8889	22	0.8584
0.4568	6.0	27	0.9003
0.5379	6.8889	31	0.9569
0.4119	8.0	36	0.9821
0.4993	8.8889	40	1.0176
0.3941	10.0	45	1.0345
0.4832	10.8889	49	1.0687
0.3836	12.0	54	1.0911
0.4758	12.8889	58	1.0688
0.3788	14.0	63	1.0902
0.4711	14.8889	67	1.0868
0.3749	16.0	72	1.0949
0.4663	16.8889	76	1.1072
0.3724	18.0	81	1.1164
0.464	18.8889	85	1.1282
0.3702	20.0	90	1.1350
0.4619	20.8889	94	1.1387
0.3684	22.0	99	1.1391
0.4108	22.2222	100	1.1394

Framework versions

PEFT 0.10.0
Transformers 4.40.2
Pytorch 2.1.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

Maab9X9
/

rapper-gpt

rapper-gpt

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Maab9X9/rapper-gpt

Evaluation results