Model Card for p60330av-evidence-detection
This is a binary classification model that was trained to detect whether a piece of evidence is relevant to a given claim or not.
Model Details
Model Description
This model is a fine-tuned version of roberta-base
for the task of evidence detection. Given a claim and a piece of evidence, it predicts whether the evidence supports the claim (1) or does not support the claim (0).
- Developed by: Ambar Vishnoi, Matthew O'Farrelly
- Model type: Classifier using deep learning approaches underpinned by transformer architecture
- Model Architecture Transformers
- Language(s) (NLP): English
- Finetuned from model: roberta-base
Model Resources
Repository https://huggingface.co/FacebookAI/roberta-base
How to Get Started with the Model
MODEL_NAME = "ambarvish/ED_BERT_model"
tokenizer = RobertaTokenizer.from_pretrained(MODEL_NAME)
model = RobertaForSequenceClassification.from_pretrained(MODEL_NAME)
For further use see the demo: ev-detect-BERT-demo.ipynb
Training Details
Training Data
~20k pairs of claims and evidences
Validation/Development data set: ~6k pairs
Training Procedure
Performed a grid search with the grid: learning_rates = [2e-5, 3e-5, 5e-5], batch_sizes = [8, 16, 32], epochs = [3, 4, 5]
on the development training set using 20% of the data for validation.
These hyperparamaters were then fixed and the model re-trained on the larger training dataset.
Training Hyperparamaters
Optimizer: AdamW
Loss Function: FocalLoss (CrossEntropy with focusing and balancing paramaters - see [1])
Learning Rate: Grid-searched, optimal value found to be 2e-5
Batch Size: Grid-searched, optimal value found to be 32
Epochs: Grid-searched, optimal value found to be 4
Hardware Used: Trained on GPU (Google Colab environment)
Speeds, Sizes, Times
- overall grid search time: ~1 hour
- overall training time: ~1 hour
- duration per training epoch: ~15 mins
- model size: 499MB
Evaluation
Testing Data, Factors & Metrics
Testing Data
Validation dataset consisting of 6k pairs.
Metrics and results
The following metrics were used to provide a quantitative evaluation of the model
Accuracy: 0.8815
Weighted Precision: 0.8859
Weighted Recall: 0.8815
Weighted F1-Score: 0.8831
Confusion Matrix: Included in evaluation script: ev-detect-BERT.ipynb
MCC: 0.7136
Techinical Specifications
Bias, Risk and Limitations
- model may not generalize well to claims outside of its training distribution.
- model limited to 512 tokens and any inputs longer will be truncated
Citations
[1]: Lin, T.Y., Goyal, P., Girshick, R., He, K. and Dollár, P., 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).
- Downloads last month
- 2
Model tree for ambarvish/ED_BERT_model
Base model
FacebookAI/xlm-roberta-base