jd0g commited on
Commit
8f3e2fd
·
verified ·
1 Parent(s): c932e25

jd0g/Mistral-7B-NLI-v0.1

Browse files
README.md CHANGED
@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  This model is a fine-tuned version of [TheBloke/Mistral-7B-v0.1-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-v0.1-GPTQ) on the None dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: nan
20
 
21
  ## Model description
22
 
@@ -35,12 +35,12 @@ More information needed
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
38
- - learning_rate: 0.004
39
- - train_batch_size: 32
40
- - eval_batch_size: 64
41
  - seed: 42
42
  - gradient_accumulation_steps: 4
43
- - total_train_batch_size: 128
44
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
  - lr_scheduler_type: linear
46
  - lr_scheduler_warmup_steps: 2
@@ -51,17 +51,17 @@ The following hyperparameters were used during training:
51
 
52
  | Training Loss | Epoch | Step | Validation Loss |
53
  |:-------------:|:-------:|:----:|:---------------:|
54
- | 0.6298 | 0.9950 | 149 | 0.4956 |
55
- | 0.4848 | 1.9967 | 299 | 0.4855 |
56
- | 1.4397 | 2.9983 | 449 | 2.3408 |
57
- | 1.4527 | 4.0 | 599 | 1.1570 |
58
- | 1.0505 | 4.9950 | 748 | 1.0305 |
59
- | 0.8713 | 5.9967 | 898 | 0.7930 |
60
- | 0.7679 | 6.9983 | 1048 | 0.7487 |
61
- | 0.7289 | 8.0 | 1198 | 0.7110 |
62
- | 69.2312 | 8.9950 | 1347 | nan |
63
- | 300.5902 | 9.9967 | 1497 | nan |
64
- | 635.9469 | 10.9449 | 1639 | nan |
65
 
66
 
67
  ### Framework versions
 
16
 
17
  This model is a fine-tuned version of [TheBloke/Mistral-7B-v0.1-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-v0.1-GPTQ) on the None dataset.
18
  It achieves the following results on the evaluation set:
19
+ - Loss: 0.4930
20
 
21
  ## Model description
22
 
 
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
38
+ - learning_rate: 0.0003
39
+ - train_batch_size: 8
40
+ - eval_batch_size: 16
41
  - seed: 42
42
  - gradient_accumulation_steps: 4
43
+ - total_train_batch_size: 32
44
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
  - lr_scheduler_type: linear
46
  - lr_scheduler_warmup_steps: 2
 
51
 
52
  | Training Loss | Epoch | Step | Validation Loss |
53
  |:-------------:|:-------:|:----:|:---------------:|
54
+ | 0.4947 | 0.9996 | 598 | 0.4534 |
55
+ | 0.4418 | 1.9992 | 1196 | 0.4475 |
56
+ | 0.4262 | 2.9987 | 1794 | 0.4476 |
57
+ | 0.4125 | 4.0 | 2393 | 0.4499 |
58
+ | 0.4015 | 4.9996 | 2991 | 0.4552 |
59
+ | 0.3908 | 5.9992 | 3589 | 0.4591 |
60
+ | 0.3809 | 6.9987 | 4187 | 0.4653 |
61
+ | 0.3712 | 8.0 | 4786 | 0.4721 |
62
+ | 0.3635 | 8.9996 | 5384 | 0.4783 |
63
+ | 0.3562 | 9.9992 | 5982 | 0.4868 |
64
+ | 0.3496 | 10.9954 | 6578 | 0.4930 |
65
 
66
 
67
  ### Framework versions
adapter_config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
- "base_model_name_or_path": "TheBloke/Mistral-7B-v0.1-GPTQ",
5
  "bias": "none",
6
  "fan_in_fan_out": false,
7
  "inference_mode": true,
 
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
+ "base_model_name_or_path": null,
5
  "bias": "none",
6
  "fan_in_fan_out": false,
7
  "inference_mode": true,
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2431f5db544e89f150cbfc5fb5f3f7107b5e98c7cc471cb5ebd53671ed35e0be
3
- size 8397056
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a74d048a07df7223efdc5042731308c2707d0a1a6a21aef7b7dc348e1ec7eec
3
+ size 8402496
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ed3469d0a4130fecdef2e2b3973ff93260321f256cf55cc61ebe2f0d3f68cb4c
3
  size 4539
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:85b86abb05ef9079982cd583fd138f45e13fe8ab0cce47062420dd7aae689ddd
3
  size 4539