topic-civil_rights / README.md
emilys's picture
Create README.md
6bd3868 verified
metadata
license: cc-by-4.0
language:
  - en
pipeline_tag: text-classification
tags:
  - RoBERTa-large
  - topic
  - news

Fine-tuned RoBERTa-large for detecting news on civil rights

Model Description

This model is a finetuned RoBERTa-large, for classifying whether news articles are about civil rights.

How to Use

from transformers import pipeline
classifier = pipeline("text-classification", model="dell-research-harvard/topic-civil_rights")
classifier("Bill of rights passes Congress")

Training data

The model was trained on a hand-labelled sample of data from the NEWSWIRE dataset.

Split Size
Train 943
Dev 202
Test 202

Test set results

Metric Result
F1 0.8696
Accuracy 0.9406
Precision 0.8511
Recall 0.8889

Citation Information

You can cite this dataset using

@misc{silcock2024newswirelargescalestructureddatabase,
      title={Newswire: A Large-Scale Structured Database of a Century of Historical News}, 
      author={Emily Silcock and Abhishek Arora and Luca D'Amico-Wong and Melissa Dell},
      year={2024},
      eprint={2406.09490},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.09490}, 
}

Applications

We applied this model to a century of historical news articles. You can see all the classifications in the NEWSWIRE dataset.