BERT NMB+ (Disinformation Sequence Classification):

Classifies 512 chunks of a news article as "Likely" or "Unlikely" biased/disinformation.

Fine-tuned BERT (bert-base-uncased) on the headline, aritcle_text and text_label fields in the News Media Bias Plus Dataset.

This model was trained without weighted sampling, and the dataset contains 81.9% 'Likely' and 18.1% 'Unlikely' examples. The same model trained with weighted sampling preformed worse on training eval metrics, but better when evaluated by gpt-4o-mini as a judge and is available here.

Metics

Evaluated on a 0.1 random sample of the NMB+ dataset, unseen during training

  • Accuracy: 0.7884
  • Precision: 0.8573
  • Recall: 0.8599
  • F1 Score: 0.8586

How to Use:

Keep in mind, this model was trained on full 512 token chunks (tends to over-predict Unlikely for standalone sentences). If you're planning on processing stand alone sentences, you may find better results with this NMB+ model, which was trained on biased headlines.

from transformers import pipeline

classifier = pipeline("text-classification", model="maximuspowers/nmbp-bert-full-articles")
result = classifier("He was a terrible politician.", top_k=2)

Example Response:

[
  {
    'label': 'Likely',
    'score': 0.9967995882034302
  },
  {
    'label': 'Unlikely',
    'score': 0.003200419945642352
  }
]
Downloads last month
14
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for maximuspowers/nmbp-bert-full-articles

Finetuned
(2309)
this model

Dataset used to train maximuspowers/nmbp-bert-full-articles