Model Card for p60330av-evidence-detection

This is a binary classification model that was trained to detect whether a piece of evidence is relevant to a given claim or not.

Model Details

Model Description

This model is a fine-tuned version of roberta-base for the task of evidence detection. Given a claim and a piece of evidence, it predicts whether the evidence supports the claim (1) or does not support the claim (0).

Developed by: Ambar Vishnoi, Matthew O'Farrelly
Model type: Classifier using deep learning approaches underpinned by transformer architecture
Model Architecture Transformers
Language(s) (NLP): English
Finetuned from model: roberta-base

Model Resources

Repository https://huggingface.co/FacebookAI/roberta-base

How to Get Started with the Model

MODEL_NAME = "ambarvish/ED_BERT_model"
tokenizer = RobertaTokenizer.from_pretrained(MODEL_NAME)
model = RobertaForSequenceClassification.from_pretrained(MODEL_NAME)

For further use see the demo: ev-detect-BERT-demo.ipynb

Training Details

Training Data

~20k pairs of claims and evidences

Validation/Development data set: ~6k pairs

Training Procedure

Performed a grid search with the grid: learning_rates = [2e-5, 3e-5, 5e-5], batch_sizes = [8, 16, 32], epochs = [3, 4, 5] on the development training set using 20% of the data for validation.

These hyperparamaters were then fixed and the model re-trained on the larger training dataset.

Training Hyperparamaters

Optimizer: AdamW

Loss Function: FocalLoss (CrossEntropy with focusing and balancing paramaters - see [1])

Learning Rate: Grid-searched, optimal value found to be 2e-5

Batch Size: Grid-searched, optimal value found to be 32

Epochs: Grid-searched, optimal value found to be 4

Hardware Used: Trained on GPU (Google Colab environment)

Speeds, Sizes, Times

overall grid search time: ~1 hour
overall training time: ~1 hour
duration per training epoch: ~15 mins
model size: 499MB

Evaluation

Testing Data, Factors & Metrics

Testing Data

Validation dataset consisting of 6k pairs.

Metrics and results

The following metrics were used to provide a quantitative evaluation of the model

Accuracy: 0.8815

Weighted Precision: 0.8859

Weighted Recall: 0.8815

Weighted F1-Score: 0.8831

Confusion Matrix: Included in evaluation script: ev-detect-BERT.ipynb

MCC: 0.7136

Techinical Specifications

Bias, Risk and Limitations

model may not generalize well to claims outside of its training distribution.
model limited to 512 tokens and any inputs longer will be truncated

Citations

[1]: Lin, T.Y., Goyal, P., Girshick, R., He, K. and Dollár, P., 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).

ambarvish
/

ED_BERT_model