shawgpt-ft
This model is a fine-tuned version of TheBloke/Mistral-7B-Instruct-v0.2-GPTQ on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 4.2320
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
4.6433 | 0.92 | 3 | 4.2320 |
4.6544 | 1.85 | 6 | 4.2320 |
4.6459 | 2.77 | 9 | 4.2320 |
3.4822 | 4.0 | 13 | 4.2320 |
4.6298 | 4.92 | 16 | 4.2320 |
4.6605 | 5.85 | 19 | 4.2320 |
4.6392 | 6.77 | 22 | 4.2320 |
3.4844 | 8.0 | 26 | 4.2320 |
4.6305 | 8.92 | 29 | 4.2320 |
4.6337 | 9.85 | 32 | 4.2320 |
4.6501 | 10.77 | 35 | 4.2320 |
3.4793 | 12.0 | 39 | 4.2320 |
4.6568 | 12.92 | 42 | 4.2320 |
4.6402 | 13.85 | 45 | 4.2320 |
4.6381 | 14.77 | 48 | 4.2320 |
3.4787 | 16.0 | 52 | 4.2320 |
4.671 | 16.92 | 55 | 4.2320 |
4.6186 | 17.85 | 58 | 4.2320 |
4.6403 | 18.77 | 61 | 4.2320 |
3.5009 | 20.0 | 65 | 4.2320 |
4.6514 | 20.92 | 68 | 4.2320 |
4.6426 | 21.85 | 71 | 4.2320 |
4.6674 | 22.77 | 74 | 4.2320 |
3.4915 | 24.0 | 78 | 4.2320 |
4.6606 | 24.92 | 81 | 4.2320 |
4.6364 | 25.85 | 84 | 4.2320 |
4.6222 | 26.77 | 87 | 4.2320 |
3.4782 | 28.0 | 91 | 4.2320 |
4.6229 | 28.92 | 94 | 4.2320 |
4.6576 | 29.85 | 97 | 4.2320 |
4.6288 | 30.77 | 100 | 4.2320 |
3.4664 | 32.0 | 104 | 4.2320 |
4.6434 | 32.92 | 107 | 4.2320 |
4.6519 | 33.85 | 110 | 4.2320 |
4.6528 | 34.77 | 113 | 4.2320 |
3.471 | 36.0 | 117 | 4.2320 |
4.6453 | 36.92 | 120 | 4.2320 |
4.616 | 37.85 | 123 | 4.2320 |
4.6109 | 38.77 | 126 | 4.2320 |
3.4799 | 40.0 | 130 | 4.2320 |
4.6388 | 40.92 | 133 | 4.2320 |
4.6711 | 41.85 | 136 | 4.2320 |
4.6483 | 42.77 | 139 | 4.2320 |
3.4695 | 44.0 | 143 | 4.2320 |
4.6496 | 44.92 | 146 | 4.2320 |
4.644 | 45.85 | 149 | 4.2320 |
4.6444 | 46.77 | 152 | 4.2320 |
3.4741 | 48.0 | 156 | 4.2320 |
4.6189 | 48.92 | 159 | 4.2320 |
4.6683 | 49.85 | 162 | 4.2320 |
4.6345 | 50.77 | 165 | 4.2320 |
3.4703 | 52.0 | 169 | 4.2320 |
4.6144 | 52.92 | 172 | 4.2320 |
4.6648 | 53.85 | 175 | 4.2320 |
4.6522 | 54.77 | 178 | 4.2320 |
3.4838 | 56.0 | 182 | 4.2320 |
4.6506 | 56.92 | 185 | 4.2320 |
4.6339 | 57.85 | 188 | 4.2320 |
4.638 | 58.77 | 191 | 4.2320 |
3.4733 | 60.0 | 195 | 4.2320 |
4.6604 | 60.92 | 198 | 4.2320 |
4.6326 | 61.85 | 201 | 4.2320 |
4.6612 | 62.77 | 204 | 4.2320 |
3.4722 | 64.0 | 208 | 4.2320 |
4.6292 | 64.92 | 211 | 4.2320 |
4.6336 | 65.85 | 214 | 4.2320 |
4.642 | 66.77 | 217 | 4.2320 |
3.4915 | 68.0 | 221 | 4.2320 |
4.6453 | 68.92 | 224 | 4.2320 |
4.6459 | 69.85 | 227 | 4.2320 |
4.6202 | 70.77 | 230 | 4.2320 |
3.4753 | 72.0 | 234 | 4.2320 |
4.6552 | 72.92 | 237 | 4.2320 |
4.6443 | 73.85 | 240 | 4.2320 |
4.6495 | 74.77 | 243 | 4.2320 |
3.4798 | 76.0 | 247 | 4.2320 |
4.6358 | 76.92 | 250 | 4.2320 |
4.6434 | 77.85 | 253 | 4.2320 |
4.6325 | 78.77 | 256 | 4.2320 |
3.4951 | 80.0 | 260 | 4.2320 |
4.6302 | 80.92 | 263 | 4.2320 |
4.6458 | 81.85 | 266 | 4.2320 |
4.6407 | 82.77 | 269 | 4.2320 |
3.4828 | 84.0 | 273 | 4.2320 |
4.6436 | 84.92 | 276 | 4.2320 |
4.6143 | 85.85 | 279 | 4.2320 |
4.644 | 86.77 | 282 | 4.2320 |
3.4934 | 88.0 | 286 | 4.2320 |
4.6308 | 88.92 | 289 | 4.2320 |
4.6715 | 89.85 | 292 | 4.2320 |
4.6229 | 90.77 | 295 | 4.2320 |
3.4895 | 92.0 | 299 | 4.2320 |
4.6447 | 92.92 | 302 | 4.2320 |
4.6333 | 93.85 | 305 | 4.2320 |
4.643 | 94.77 | 308 | 4.2320 |
3.482 | 96.0 | 312 | 4.2320 |
4.6647 | 96.92 | 315 | 4.2320 |
4.65 | 97.85 | 318 | 4.2320 |
4.6545 | 98.77 | 321 | 4.2320 |
3.4881 | 100.0 | 325 | 4.2320 |
4.6828 | 100.92 | 328 | 4.2320 |
4.6328 | 101.85 | 331 | 4.2320 |
4.6419 | 102.77 | 334 | 4.2320 |
3.4954 | 104.0 | 338 | 4.2320 |
4.6203 | 104.92 | 341 | 4.2320 |
4.6236 | 105.85 | 344 | 4.2320 |
4.6539 | 106.77 | 347 | 4.2320 |
3.4737 | 108.0 | 351 | 4.2320 |
4.6319 | 108.92 | 354 | 4.2320 |
4.6696 | 109.85 | 357 | 4.2320 |
4.6678 | 110.77 | 360 | 4.2320 |
3.4698 | 112.0 | 364 | 4.2320 |
4.6459 | 112.92 | 367 | 4.2320 |
4.6524 | 113.85 | 370 | 4.2320 |
4.6399 | 114.77 | 373 | 4.2320 |
3.471 | 116.0 | 377 | 4.2320 |
4.6668 | 116.92 | 380 | 4.2320 |
4.634 | 117.85 | 383 | 4.2320 |
4.6345 | 118.77 | 386 | 4.2320 |
3.4938 | 120.0 | 390 | 4.2320 |
4.6386 | 120.92 | 393 | 4.2320 |
4.6661 | 121.85 | 396 | 4.2320 |
4.6465 | 122.77 | 399 | 4.2320 |
3.4903 | 124.0 | 403 | 4.2320 |
4.6255 | 124.92 | 406 | 4.2320 |
4.6306 | 125.85 | 409 | 4.2320 |
4.6348 | 126.77 | 412 | 4.2320 |
3.4811 | 128.0 | 416 | 4.2320 |
4.6335 | 128.92 | 419 | 4.2320 |
4.6678 | 129.85 | 422 | 4.2320 |
4.6336 | 130.77 | 425 | 4.2320 |
3.4722 | 132.0 | 429 | 4.2320 |
4.6371 | 132.92 | 432 | 4.2320 |
4.6488 | 133.85 | 435 | 4.2320 |
4.6456 | 134.77 | 438 | 4.2320 |
3.4866 | 136.0 | 442 | 4.2320 |
4.6349 | 136.92 | 445 | 4.2320 |
4.6418 | 137.85 | 448 | 4.2320 |
4.6546 | 138.77 | 451 | 4.2320 |
3.4811 | 140.0 | 455 | 4.2320 |
4.6322 | 140.92 | 458 | 4.2320 |
4.6154 | 141.85 | 461 | 4.2320 |
4.6362 | 142.77 | 464 | 4.2320 |
3.4809 | 144.0 | 468 | 4.2320 |
4.6317 | 144.92 | 471 | 4.2320 |
4.6329 | 145.85 | 474 | 4.2320 |
4.636 | 146.77 | 477 | 4.2320 |
3.4737 | 148.0 | 481 | 4.2320 |
4.629 | 148.92 | 484 | 4.2320 |
4.6212 | 149.85 | 487 | 4.2320 |
4.6548 | 150.77 | 490 | 4.2320 |
3.481 | 152.0 | 494 | 4.2320 |
4.6379 | 152.92 | 497 | 4.2320 |
4.6306 | 153.85 | 500 | 4.2320 |
4.6443 | 154.77 | 503 | 4.2320 |
3.4951 | 156.0 | 507 | 4.2320 |
4.6514 | 156.92 | 510 | 4.2320 |
4.6539 | 157.85 | 513 | 4.2320 |
4.6295 | 158.77 | 516 | 4.2320 |
3.485 | 160.0 | 520 | 4.2320 |
4.6665 | 160.92 | 523 | 4.2320 |
4.6508 | 161.85 | 526 | 4.2320 |
4.6754 | 162.77 | 529 | 4.2320 |
3.4689 | 164.0 | 533 | 4.2320 |
4.6286 | 164.92 | 536 | 4.2320 |
4.6164 | 165.85 | 539 | 4.2320 |
4.634 | 166.77 | 542 | 4.2320 |
3.4878 | 168.0 | 546 | 4.2320 |
4.6616 | 168.92 | 549 | 4.2320 |
4.6228 | 169.85 | 552 | 4.2320 |
4.6427 | 170.77 | 555 | 4.2320 |
3.4739 | 172.0 | 559 | 4.2320 |
4.656 | 172.92 | 562 | 4.2320 |
4.6488 | 173.85 | 565 | 4.2320 |
4.6199 | 174.77 | 568 | 4.2320 |
3.4842 | 176.0 | 572 | 4.2320 |
4.6632 | 176.92 | 575 | 4.2320 |
4.646 | 177.85 | 578 | 4.2320 |
4.6226 | 178.77 | 581 | 4.2320 |
3.4619 | 180.0 | 585 | 4.2320 |
4.6329 | 180.92 | 588 | 4.2320 |
4.6245 | 181.85 | 591 | 4.2320 |
4.6435 | 182.77 | 594 | 4.2320 |
3.478 | 184.0 | 598 | 4.2320 |
4.6256 | 184.92 | 601 | 4.2320 |
4.6516 | 185.85 | 604 | 4.2320 |
4.6438 | 186.77 | 607 | 4.2320 |
3.5015 | 188.0 | 611 | 4.2320 |
4.6254 | 188.92 | 614 | 4.2320 |
4.6265 | 189.85 | 617 | 4.2320 |
4.6447 | 190.77 | 620 | 4.2320 |
3.508 | 192.0 | 624 | 4.2320 |
4.6353 | 192.92 | 627 | 4.2320 |
4.6333 | 193.85 | 630 | 4.2320 |
4.6573 | 194.77 | 633 | 4.2320 |
3.4644 | 196.0 | 637 | 4.2320 |
4.6413 | 196.92 | 640 | 4.2320 |
4.6641 | 197.85 | 643 | 4.2320 |
4.638 | 198.77 | 646 | 4.2320 |
3.4885 | 200.0 | 650 | 4.2320 |
4.6502 | 200.92 | 653 | 4.2320 |
4.6476 | 201.85 | 656 | 4.2320 |
4.645 | 202.77 | 659 | 4.2320 |
3.4861 | 204.0 | 663 | 4.2320 |
4.6418 | 204.92 | 666 | 4.2320 |
4.6419 | 205.85 | 669 | 4.2320 |
4.6395 | 206.77 | 672 | 4.2320 |
3.4739 | 208.0 | 676 | 4.2320 |
4.6306 | 208.92 | 679 | 4.2320 |
4.6245 | 209.85 | 682 | 4.2320 |
4.6614 | 210.77 | 685 | 4.2320 |
3.4965 | 212.0 | 689 | 4.2320 |
4.642 | 212.92 | 692 | 4.2320 |
4.6371 | 213.85 | 695 | 4.2320 |
4.6265 | 214.77 | 698 | 4.2320 |
3.4965 | 216.0 | 702 | 4.2320 |
4.6648 | 216.92 | 705 | 4.2320 |
4.6248 | 217.85 | 708 | 4.2320 |
4.6507 | 218.77 | 711 | 4.2320 |
3.4741 | 220.0 | 715 | 4.2320 |
4.644 | 220.92 | 718 | 4.2320 |
4.6315 | 221.85 | 721 | 4.2320 |
4.659 | 222.77 | 724 | 4.2320 |
3.4942 | 224.0 | 728 | 4.2320 |
4.6463 | 224.92 | 731 | 4.2320 |
4.6477 | 225.85 | 734 | 4.2320 |
4.6323 | 226.77 | 737 | 4.2320 |
3.4907 | 228.0 | 741 | 4.2320 |
4.6323 | 228.92 | 744 | 4.2320 |
4.6442 | 229.85 | 747 | 4.2320 |
4.6351 | 230.77 | 750 | 4.2320 |
3.4799 | 232.0 | 754 | 4.2320 |
4.6463 | 232.92 | 757 | 4.2320 |
4.6389 | 233.85 | 760 | 4.2320 |
4.6399 | 234.77 | 763 | 4.2320 |
3.4819 | 236.0 | 767 | 4.2320 |
4.678 | 236.92 | 770 | 4.2320 |
4.6446 | 237.85 | 773 | 4.2320 |
4.642 | 238.77 | 776 | 4.2320 |
3.4879 | 240.0 | 780 | 4.2320 |
4.6561 | 240.92 | 783 | 4.2320 |
4.6226 | 241.85 | 786 | 4.2320 |
4.6607 | 242.77 | 789 | 4.2320 |
3.4901 | 244.0 | 793 | 4.2320 |
4.6317 | 244.92 | 796 | 4.2320 |
4.6387 | 245.85 | 799 | 4.2320 |
4.6493 | 246.77 | 802 | 4.2320 |
3.4863 | 248.0 | 806 | 4.2320 |
4.6187 | 248.92 | 809 | 4.2320 |
4.6449 | 249.85 | 812 | 4.2320 |
4.6542 | 250.77 | 815 | 4.2320 |
3.4905 | 252.0 | 819 | 4.2320 |
4.6514 | 252.92 | 822 | 4.2320 |
4.6496 | 253.85 | 825 | 4.2320 |
4.6542 | 254.77 | 828 | 4.2320 |
3.4661 | 256.0 | 832 | 4.2320 |
4.631 | 256.92 | 835 | 4.2320 |
4.644 | 257.85 | 838 | 4.2320 |
4.6348 | 258.77 | 841 | 4.2320 |
3.5069 | 260.0 | 845 | 4.2320 |
4.6257 | 260.92 | 848 | 4.2320 |
4.6584 | 261.85 | 851 | 4.2320 |
4.6344 | 262.77 | 854 | 4.2320 |
3.4721 | 264.0 | 858 | 4.2320 |
4.6429 | 264.92 | 861 | 4.2320 |
4.6433 | 265.85 | 864 | 4.2320 |
4.6391 | 266.77 | 867 | 4.2320 |
3.4916 | 268.0 | 871 | 4.2320 |
4.6564 | 268.92 | 874 | 4.2320 |
4.658 | 269.85 | 877 | 4.2320 |
4.6329 | 270.77 | 880 | 4.2320 |
3.4783 | 272.0 | 884 | 4.2320 |
4.6384 | 272.92 | 887 | 4.2320 |
4.6482 | 273.85 | 890 | 4.2320 |
4.6688 | 274.77 | 893 | 4.2320 |
3.4659 | 276.0 | 897 | 4.2320 |
4.6299 | 276.92 | 900 | 4.2320 |
4.6392 | 277.85 | 903 | 4.2320 |
4.6521 | 278.77 | 906 | 4.2320 |
3.4949 | 280.0 | 910 | 4.2320 |
4.6643 | 280.92 | 913 | 4.2320 |
4.6361 | 281.85 | 916 | 4.2320 |
4.6505 | 282.77 | 919 | 4.2320 |
3.4847 | 284.0 | 923 | 4.2320 |
4.639 | 284.92 | 926 | 4.2320 |
4.6276 | 285.85 | 929 | 4.2320 |
4.6438 | 286.77 | 932 | 4.2320 |
3.4883 | 288.0 | 936 | 4.2320 |
4.6483 | 288.92 | 939 | 4.2320 |
4.6564 | 289.85 | 942 | 4.2320 |
4.6437 | 290.77 | 945 | 4.2320 |
3.4712 | 292.0 | 949 | 4.2320 |
4.6627 | 292.92 | 952 | 4.2320 |
4.6371 | 293.85 | 955 | 4.2320 |
4.6196 | 294.77 | 958 | 4.2320 |
3.4859 | 296.0 | 962 | 4.2320 |
4.6457 | 296.92 | 965 | 4.2320 |
4.6249 | 297.85 | 968 | 4.2320 |
4.6382 | 298.77 | 971 | 4.2320 |
3.4824 | 300.0 | 975 | 4.2320 |
4.6541 | 300.92 | 978 | 4.2320 |
4.659 | 301.85 | 981 | 4.2320 |
4.618 | 302.77 | 984 | 4.2320 |
3.4751 | 304.0 | 988 | 4.2320 |
4.623 | 304.92 | 991 | 4.2320 |
4.6371 | 305.85 | 994 | 4.2320 |
4.6546 | 306.77 | 997 | 4.2320 |
3.1908 | 307.69 | 1000 | 4.2320 |
Framework versions
- PEFT 0.10.0
- Transformers 4.36.2
- Pytorch 2.1.0+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 6
Model tree for jeroenherczeg/shawgpt-ft
Base model
mistralai/Mistral-7B-Instruct-v0.2
Quantized
TheBloke/Mistral-7B-Instruct-v0.2-GPTQ