longformer-one-step / README.md

Theoreticallyhugo

trainer: training complete at 2023-11-27 14:10:49.177196.

9d76f33 10 months ago

preview code

raw

history blame

No virus

7.86 kB

	---
	license: apache-2.0
	base_model: allenai/longformer-base-4096
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: longformer-one-step
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# longformer-one-step

	This model is a fine-tuned version of [allenai/longformer-base-4096](https://huggingface.co/allenai/longformer-base-4096) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4163
	- B-claim: {'precision': 0.5555555555555556, 'recall': 0.551948051948052, 'f1-score': 0.5537459283387622, 'support': 154.0}
	- B-majorclaim: {'precision': 0.6285714285714286, 'recall': 0.6875, 'f1-score': 0.6567164179104478, 'support': 64.0}
	- B-premise: {'precision': 0.7458677685950413, 'recall': 0.8414918414918415, 'f1-score': 0.7907995618838992, 'support': 429.0}
	- I-claim: {'precision': 0.6674840608141246, 'recall': 0.6067766384306732, 'f1-score': 0.6356842596917328, 'support': 2243.0}
	- I-majorclaim: {'precision': 0.7213656387665198, 'recall': 0.7511467889908257, 'f1-score': 0.7359550561797752, 'support': 872.0}
	- I-premise: {'precision': 0.8961770096884001, 'recall': 0.9113300492610837, 'f1-score': 0.903690012542082, 'support': 7511.0}
	- O: {'precision': 0.9139617607825701, 'recall': 0.910117334514058, 'f1-score': 0.9120354963948973, 'support': 4517.0}
	- Accuracy: 0.8526
	- Macro avg: {'precision': 0.7327118889676628, 'recall': 0.751472957805219, 'f1-score': 0.741232390420228, 'support': 15790.0}
	- Weighted avg: {'precision': 0.8506339314975508, 'recall': 0.8525649145028499, 'f1-score': 0.8512623407634757, 'support': 15790.0}

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| B-claim \| B-majorclaim \| B-premise \| I-claim \| I-majorclaim \| I-premise \| O \| Accuracy \| Macro avg \| Weighted avg \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:-------------------------------------------------------------------------------------------------------------------:\|:------------------------------------------------------------------------------------------------------:\|:-----------------------------------------------------------------------------------------------------------------:\|:-------------------------------------------------------------------------------------------------------------------:\|:-----------------------------------------------------------------------------------------------------------------:\|:------------------------------------------------------------------------------------------------------------------:\|:------------------------------------------------------------------------------------------------------------------:\|:--------:\|:-------------------------------------------------------------------------------------------------------------------:\|:-------------------------------------------------------------------------------------------------------------------:\|
	\| No log \| 1.0 \| 196 \| 0.4925 \| {'precision': 0.4489795918367347, 'recall': 0.14285714285714285, 'f1-score': 0.21674876847290642, 'support': 154.0} \| {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 64.0} \| {'precision': 0.6521739130434783, 'recall': 0.8741258741258742, 'f1-score': 0.7470119521912351, 'support': 429.0} \| {'precision': 0.6343843843843844, 'recall': 0.37672759696834596, 'f1-score': 0.4727272727272727, 'support': 2243.0} \| {'precision': 0.6209944751381216, 'recall': 0.6444954128440367, 'f1-score': 0.6325267304445695, 'support': 872.0} \| {'precision': 0.845458984375, 'recall': 0.92211423245906, 'f1-score': 0.8821244348213716, 'support': 7511.0} \| {'precision': 0.8657378087397086, 'recall': 0.9079034757582466, 'f1-score': 0.8863194294359196, 'support': 4517.0} \| 0.8126 \| {'precision': 0.5811041653596325, 'recall': 0.5526033907161009, 'f1-score': 0.5482083697276107, 'support': 15790.0} \| {'precision': 0.7963354614345164, 'recall': 0.8126029132362255, 'f1-score': 0.7976491141364899, 'support': 15790.0} \|
	\| No log \| 2.0 \| 392 \| 0.4278 \| {'precision': 0.5454545454545454, 'recall': 0.5064935064935064, 'f1-score': 0.5252525252525252, 'support': 154.0} \| {'precision': 0.7090909090909091, 'recall': 0.609375, 'f1-score': 0.6554621848739497, 'support': 64.0} \| {'precision': 0.6920222634508348, 'recall': 0.8694638694638694, 'f1-score': 0.7706611570247933, 'support': 429.0} \| {'precision': 0.6465288818229995, 'recall': 0.5439144003566652, 'f1-score': 0.5907990314769976, 'support': 2243.0} \| {'precision': 0.7232558139534884, 'recall': 0.713302752293578, 'f1-score': 0.7182448036951501, 'support': 872.0} \| {'precision': 0.8695761223977928, 'recall': 0.9231793369724404, 'f1-score': 0.8955763642234421, 'support': 7511.0} \| {'precision': 0.9275161588180979, 'recall': 0.8895284480850122, 'f1-score': 0.9081252118883489, 'support': 4517.0} \| 0.8413 \| {'precision': 0.7304920992840954, 'recall': 0.7221796162378674, 'f1-score': 0.7234458969193153, 'support': 15790.0} \| {'precision': 0.8377504411405843, 'recall': 0.8412919569347689, 'f1-score': 0.8381000288341659, 'support': 15790.0} \|
	\| 0.598 \| 3.0 \| 588 \| 0.4163 \| {'precision': 0.5555555555555556, 'recall': 0.551948051948052, 'f1-score': 0.5537459283387622, 'support': 154.0} \| {'precision': 0.6285714285714286, 'recall': 0.6875, 'f1-score': 0.6567164179104478, 'support': 64.0} \| {'precision': 0.7458677685950413, 'recall': 0.8414918414918415, 'f1-score': 0.7907995618838992, 'support': 429.0} \| {'precision': 0.6674840608141246, 'recall': 0.6067766384306732, 'f1-score': 0.6356842596917328, 'support': 2243.0} \| {'precision': 0.7213656387665198, 'recall': 0.7511467889908257, 'f1-score': 0.7359550561797752, 'support': 872.0} \| {'precision': 0.8961770096884001, 'recall': 0.9113300492610837, 'f1-score': 0.903690012542082, 'support': 7511.0} \| {'precision': 0.9139617607825701, 'recall': 0.910117334514058, 'f1-score': 0.9120354963948973, 'support': 4517.0} \| 0.8526 \| {'precision': 0.7327118889676628, 'recall': 0.751472957805219, 'f1-score': 0.741232390420228, 'support': 15790.0} \| {'precision': 0.8506339314975508, 'recall': 0.8525649145028499, 'f1-score': 0.8512623407634757, 'support': 15790.0} \|


	### Framework versions

	- Transformers 4.33.0
	- Pytorch 2.0.1+cu118
	- Datasets 2.14.4
	- Tokenizers 0.13.3