File size: 8,648 Bytes
49c7a0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
license: apache-2.0
base_model: allenai/longformer-base-4096
tags:
- generated_from_trainer
datasets:
- essays_su_g
metrics:
- accuracy
model-index:
- name: longformer-spans
  results:
  - task:
      name: Token Classification
      type: token-classification
    dataset:
      name: essays_su_g
      type: essays_su_g
      config: spans
      split: train[80%:100%]
      args: spans
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.9362395452405953
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# longformer-spans

This model is a fine-tuned version of [allenai/longformer-base-4096](https://huggingface.co/allenai/longformer-base-4096) on the essays_su_g dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1974
- B: {'precision': 0.8404351767905711, 'recall': 0.8887823585810163, 'f1-score': 0.863932898415657, 'support': 1043.0}
- I: {'precision': 0.9420745397395599, 'recall': 0.9673775216138328, 'f1-score': 0.954558380253654, 'support': 17350.0}
- O: {'precision': 0.9364367816091954, 'recall': 0.8830479080858443, 'f1-score': 0.9089590538882071, 'support': 9226.0}
- Accuracy: 0.9362
- Macro avg: {'precision': 0.9063154993797754, 'recall': 0.9130692627602311, 'f1-score': 0.9091501108525061, 'support': 27619.0}
- Weighted avg: {'precision': 0.9363529780585962, 'recall': 0.9362395452405953, 'f1-score': 0.9359037670307043, 'support': 27619.0}

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 8

### Training results

| Training Loss | Epoch | Step | Validation Loss | B                                                                                                                   | I                                                                                                                   | O                                                                                                                  | Accuracy | Macro avg                                                                                                           | Weighted avg                                                                                                        |
|:-------------:|:-----:|:----:|:---------------:|:-------------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------:|:--------:|:-------------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------:|
| No log        | 1.0   | 41   | 0.2955          | {'precision': 0.7986798679867987, 'recall': 0.46404602109300097, 'f1-score': 0.5870224378411159, 'support': 1043.0} | {'precision': 0.8854450261780105, 'recall': 0.9747550432276657, 'f1-score': 0.9279561042524005, 'support': 17350.0} | {'precision': 0.9346644761784405, 'recall': 0.8016475178842402, 'f1-score': 0.8630608553591224, 'support': 9226.0} | 0.8976   | {'precision': 0.8729297901144165, 'recall': 0.7468161940683024, 'f1-score': 0.7926797991508797, 'support': 27619.0} | {'precision': 0.8986099700829504, 'recall': 0.8976429269705637, 'f1-score': 0.893403174010308, 'support': 27619.0}  |
| No log        | 2.0   | 82   | 0.2031          | {'precision': 0.784197111299915, 'recall': 0.8849472674976031, 'f1-score': 0.8315315315315316, 'support': 1043.0}   | {'precision': 0.9307149161518093, 'recall': 0.9724495677233429, 'f1-score': 0.9511246406223575, 'support': 17350.0} | {'precision': 0.9504450324753428, 'recall': 0.8564925211359202, 'f1-score': 0.9010262257696694, 'support': 9226.0} | 0.9304   | {'precision': 0.8884523533090224, 'recall': 0.9046297854522888, 'f1-score': 0.8945607993078529, 'support': 27619.0} | {'precision': 0.9317725932125427, 'recall': 0.9304102248452153, 'f1-score': 0.9298731982018269, 'support': 27619.0} |
| No log        | 3.0   | 123  | 0.1754          | {'precision': 0.8527204502814258, 'recall': 0.8715244487056567, 'f1-score': 0.8620199146514935, 'support': 1043.0}  | {'precision': 0.9616262064931267, 'recall': 0.947492795389049, 'f1-score': 0.9545071853679779, 'support': 17350.0}  | {'precision': 0.9036794248255445, 'recall': 0.9264036418816388, 'f1-score': 0.9149004495825305, 'support': 9226.0} | 0.9376   | {'precision': 0.906008693866699, 'recall': 0.9151402953254482, 'f1-score': 0.9104758498673339, 'support': 27619.0}  | {'precision': 0.9381566488916958, 'recall': 0.9375792027227633, 'f1-score': 0.9377840611522629, 'support': 27619.0} |
| No log        | 4.0   | 164  | 0.2248          | {'precision': 0.8219800181653043, 'recall': 0.8676893576222435, 'f1-score': 0.8442164179104478, 'support': 1043.0}  | {'precision': 0.9191395059726502, 'recall': 0.9801152737752161, 'f1-score': 0.9486485732615547, 'support': 17350.0} | {'precision': 0.9589622053137083, 'recall': 0.8332972035551701, 'f1-score': 0.8917241779272748, 'support': 9226.0} | 0.9268   | {'precision': 0.9000272431505542, 'recall': 0.8937006116508766, 'f1-score': 0.8948630563664257, 'support': 27619.0} | {'precision': 0.9287729785218931, 'recall': 0.9268257359064412, 'f1-score': 0.9256894795439954, 'support': 27619.0} |
| No log        | 5.0   | 205  | 0.1931          | {'precision': 0.848987108655617, 'recall': 0.8839884947267498, 'f1-score': 0.8661343353687178, 'support': 1043.0}   | {'precision': 0.9373124374791597, 'recall': 0.9721037463976945, 'f1-score': 0.9543911272068809, 'support': 17350.0} | {'precision': 0.9444899871179295, 'recall': 0.8741599826577064, 'f1-score': 0.9079650999155643, 'support': 9226.0} | 0.9361   | {'precision': 0.910263177750902, 'recall': 0.9100840745940503, 'f1-score': 0.909496854163721, 'support': 27619.0}   | {'precision': 0.9363745597502171, 'recall': 0.9360585104457076, 'f1-score': 0.935549809212859, 'support': 27619.0}  |
| No log        | 6.0   | 246  | 0.1742          | {'precision': 0.8382222222222222, 'recall': 0.9041227229146692, 'f1-score': 0.8699261992619925, 'support': 1043.0}  | {'precision': 0.9481431159420289, 'recall': 0.9653025936599423, 'f1-score': 0.956645913063346, 'support': 17350.0}  | {'precision': 0.9353340883352208, 'recall': 0.8951875135486668, 'f1-score': 0.9148205582631811, 'support': 9226.0} | 0.9396   | {'precision': 0.9072331421664908, 'recall': 0.9215376100410927, 'f1-score': 0.9137975568628399, 'support': 27619.0} | {'precision': 0.9397132821011885, 'recall': 0.9395705854665267, 'f1-score': 0.9393994745651696, 'support': 27619.0} |
| No log        | 7.0   | 287  | 0.1985          | {'precision': 0.8421052631578947, 'recall': 0.8897411313518696, 'f1-score': 0.8652680652680652, 'support': 1043.0}  | {'precision': 0.9399821009061416, 'recall': 0.9685878962536023, 'f1-score': 0.9540706256386964, 'support': 17350.0} | {'precision': 0.9385345526102559, 'recall': 0.8788207240407544, 'f1-score': 0.9076966134900645, 'support': 9226.0} | 0.9356   | {'precision': 0.9068739722247642, 'recall': 0.9123832505487423, 'f1-score': 0.9090117681322755, 'support': 27619.0} | {'precision': 0.935802347028403, 'recall': 0.9356240269379775, 'f1-score': 0.9352260727385245, 'support': 27619.0}  |
| No log        | 8.0   | 328  | 0.1974          | {'precision': 0.8404351767905711, 'recall': 0.8887823585810163, 'f1-score': 0.863932898415657, 'support': 1043.0}   | {'precision': 0.9420745397395599, 'recall': 0.9673775216138328, 'f1-score': 0.954558380253654, 'support': 17350.0}  | {'precision': 0.9364367816091954, 'recall': 0.8830479080858443, 'f1-score': 0.9089590538882071, 'support': 9226.0} | 0.9362   | {'precision': 0.9063154993797754, 'recall': 0.9130692627602311, 'f1-score': 0.9091501108525061, 'support': 27619.0} | {'precision': 0.9363529780585962, 'recall': 0.9362395452405953, 'f1-score': 0.9359037670307043, 'support': 27619.0} |


### Framework versions

- Transformers 4.37.2
- Pytorch 2.2.0+cu121
- Datasets 2.17.0
- Tokenizers 0.15.2