Phi-3-mini-128k-instruct-advisegpt-v0.2
This model is a fine-tuned version of microsoft/Phi-3-mini-128k-instruct on the generator dataset. It achieves the following results on the evaluation set:
- Loss: 1.8937
- Bleu: {'bleu': 0.26205068002927057, 'precisions': [0.6385562102386747, 0.3220126728603845, 0.19412484437384622, 0.13232381936372636], 'brevity_penalty': 0.9720474824019883, 'length_ratio': 0.9724309736350426, 'translation_length': 187368, 'reference_length': 192680}
- Rouge: {'rouge1': 0.6264248496834525, 'rouge2': 0.3031545327309577, 'rougeL': 0.5022734325866114, 'rougeLsum': 0.5017276717558696}
- Exact Match: {'exact_match': 0.0}
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 5
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 12
- total_train_batch_size: 60
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- num_epochs: 10
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu | Rouge | Exact Match |
---|---|---|---|---|---|---|
1.0389 | 0.9930 | 71 | 1.8937 | {'bleu': 0.26205068002927057, 'precisions': [0.6385562102386747, 0.3220126728603845, 0.19412484437384622, 0.13232381936372636], 'brevity_penalty': 0.9720474824019883, 'length_ratio': 0.9724309736350426, 'translation_length': 187368, 'reference_length': 192680} | {'rouge1': 0.6264248496834525, 'rouge2': 0.3031545327309577, 'rougeL': 0.5022734325866114, 'rougeLsum': 0.5017276717558696} | {'exact_match': 0.0} |
0.7026 | 2.0 | 143 | 2.0257 | {'bleu': 0.22948697087184314, 'precisions': [0.6175561684920868, 0.2864991434080009, 0.16448293132138875, 0.10829521706024982], 'brevity_penalty': 0.9685578318352563, 'length_ratio': 0.9690419348141998, 'translation_length': 186715, 'reference_length': 192680} | {'rouge1': 0.6021744263812635, 'rouge2': 0.2645080008922339, 'rougeL': 0.47549724399365867, 'rougeLsum': 0.47563577913274346} | {'exact_match': 0.0} |
0.5794 | 2.9930 | 214 | 2.0827 | {'bleu': 0.22453345451779733, 'precisions': [0.6142047063979434, 0.2794608644390257, 0.15886996662779512, 0.10500024249478636], 'brevity_penalty': 0.9706541083647924, 'length_ratio': 0.9710763960971559, 'translation_length': 187107, 'reference_length': 192680} | {'rouge1': 0.5986129640494808, 'rouge2': 0.2565288834240412, 'rougeL': 0.47029440892215696, 'rougeLsum': 0.4703605206181696} | {'exact_match': 0.0} |
0.5107 | 4.0 | 286 | 2.0999 | {'bleu': 0.22808449006897172, 'precisions': [0.6164639351259069, 0.28426452965847815, 0.16231439361428204, 0.10640438075565883], 'brevity_penalty': 0.9724315296841323, 'length_ratio': 0.9728046501972182, 'translation_length': 187440, 'reference_length': 192680} | {'rouge1': 0.6010609898102299, 'rouge2': 0.2621809898542294, 'rougeL': 0.4728255342917802, 'rougeLsum': 0.4728531320642606} | {'exact_match': 0.0} |
0.4923 | 4.9930 | 357 | 2.0932 | {'bleu': 0.23027336632996132, 'precisions': [0.6166044676937471, 0.2878130430610787, 0.1642595225622989, 0.10630862410891355], 'brevity_penalty': 0.975977176959311, 'length_ratio': 0.9762611583973427, 'translation_length': 188106, 'reference_length': 192680} | {'rouge1': 0.6020695302602435, 'rouge2': 0.2657671472450324, 'rougeL': 0.47423678533654967, 'rougeLsum': 0.47426066890913565} | {'exact_match': 0.0} |
0.4431 | 6.0 | 429 | 2.0962 | {'bleu': 0.22873099259924137, 'precisions': [0.6169168021752459, 0.28490855532923826, 0.16326705657201365, 0.10637588763042322], 'brevity_penalty': 0.9730979379483501, 'length_ratio': 0.9734533942287731, 'translation_length': 187565, 'reference_length': 192680} | {'rouge1': 0.6015904749444395, 'rouge2': 0.26263389133741416, 'rougeL': 0.4729371282759689, 'rougeLsum': 0.4730073305944661} | {'exact_match': 0.0} |
0.4291 | 6.9930 | 500 | 2.0895 | {'bleu': 0.23078161525345967, 'precisions': [0.6175051285594328, 0.2861604050093259, 0.16454167512744605, 0.10739661140462743], 'brevity_penalty': 0.9762747516268988, 'length_ratio': 0.9765517957234794, 'translation_length': 188162, 'reference_length': 192680} | {'rouge1': 0.6034137320239901, 'rouge2': 0.26422178262738116, 'rougeL': 0.47430934107431466, 'rougeLsum': 0.47430902463237395} | {'exact_match': 0.0} |
0.4297 | 8.0 | 572 | 2.0865 | {'bleu': 0.22849194288081487, 'precisions': [0.6172627948932184, 0.28407374796552737, 0.1623422141125731, 0.10599288515917175], 'brevity_penalty': 0.9749190245343078, 'length_ratio': 0.9752283578991073, 'translation_length': 187907, 'reference_length': 192680} | {'rouge1': 0.6027503352616924, 'rouge2': 0.2615077454867606, 'rougeL': 0.47349895225288113, 'rougeLsum': 0.47352034156560674} | {'exact_match': 0.0} |
0.4361 | 8.9930 | 643 | 2.0832 | {'bleu': 0.2305080658084417, 'precisions': [0.6175195604418985, 0.2856609509586922, 0.16423418171705448, 0.10763603992041658], 'brevity_penalty': 0.9754508959408048, 'length_ratio': 0.9757473531243512, 'translation_length': 188007, 'reference_length': 192680} | {'rouge1': 0.6029422201953518, 'rouge2': 0.26346694480161104, 'rougeL': 0.4742809273284626, 'rougeLsum': 0.4743122502561476} | {'exact_match': 0.0} |
0.4423 | 9.9301 | 710 | 2.0840 | {'bleu': 0.230038020190203, 'precisions': [0.6176251608717387, 0.2855817326664036, 0.16376314072743217, 0.10700689536841428], 'brevity_penalty': 0.9756157200699793, 'length_ratio': 0.9759082416441769, 'translation_length': 188038, 'reference_length': 192680} | {'rouge1': 0.603139585918947, 'rouge2': 0.26328950362942705, 'rougeL': 0.4742788009942601, 'rougeLsum': 0.47433418479279266} | {'exact_match': 0.0} |
Framework versions
- PEFT 0.10.0
- Transformers 4.40.1
- Pytorch 2.3.0+cu121
- Datasets 2.19.0
- Tokenizers 0.19.1
- Downloads last month
- 8
Model tree for ninyx/Phi-3-mini-128k-instruct-advisegpt-v0.2
Base model
microsoft/Phi-3-mini-128k-instruct