File size: 6,483 Bytes
38cee3f febf31d 38cee3f febf31d 38cee3f 44fc8f8 38cee3f 44fc8f8 38cee3f febf31d 38cee3f 6edbff0 38cee3f 0125fef 38cee3f 0125fef 38cee3f 0125fef 38cee3f 5d17ddf c8fe128 5d17ddf c8fe128 5d17ddf c8fe128 5d17ddf 38cee3f ee92188 38cee3f ee92188 38cee3f ee92188 38cee3f ee92188 38cee3f ee92188 38cee3f ee92188 38cee3f febf31d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
---
library_name: transformers
license: apache-2.0
language:
- de
pipeline_tag: text-classification
tags:
- populism
- political-speech
- classification
- german
- Bundestag
- NLP
base_model:
- EuroBERT/EuroBERT-210m
---
# PopEuroBERT-210m
## Binary Populism Classifier for German Bundestag Speeches
## Table of Contents
1. [Overview](#overview)
2. [Usage](#usage)
3. [Training Data](#training-data)
4. [Training Procedure](#training-procedure)
5. [Evaluation](#evaluation)
6. [Limitations](#limitations)
7. [Ethical Considerations](#ethical-considerations)
8. [License](#license)
9. [Citation](#citation)
## Overview
This model is a fine-tuned version of [EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m) on the [PopBERT](https://huggingface.co/luerhard/PopBERT) dataset (sentence-level annotated German Bundestag speeches) for **populist rhetoric classification**. It predicts whether a given speech excerpt contains populist language.
**Key Features:**
- Trained on **German Bundestag speeches** sentence-level annotated for populism.
- Fine-tuned using **5-fold cross-validation**.
- Optimized with **decision threshold tuning**.
## Usage
To use the model in Python:
```python
import torch
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
model_id = "przvl/PopEuroBERT-binary-210m"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(
model_id, trust_remote_code=True
)
# define text to be predicted
text = (
"Aber Ihnen fehlt eben der Mut, Ihnen fehlen die Visionen, um sich"
"gegen die Konzerne und gegen die Lobbygruppen zur Wehr zu setzen."
)
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
# get classification probability
logits = outputs.logits
probs = torch.softmax(logits, dim=-1) # shape [1, 2]
populist_prob = probs[0, 1].item() # probability of class=1 (populist)
# use decision threshold 0.56
threshold = 0.56
label = "Populist" if populist_prob > threshold else "Neutral"
print(f"Predicted class: {label} (Confidence: {populist_prob:.2f})")
```
```text
Predicted class: Populist (Confidence: 0.90)
```
**Use decision threshold `0.56` for balanced [performance](#evaluation).**
## Training Data
- **Dataset:** [PopBERT](https://github.com/luerhard/PopBERT)
- Sentence-level annotated German Bundestag speeches
- `train/test: 7017/1758`
- **Preprocessing:**
- Converted labels to binary format (`populist = 1`, `neutral = 0`).
- Tokenized using **EuroBERT tokenizer** with a max length of `256` tokens.
## Training Procedure
- **Base Model:** [EuroBERT-210M](https://huggingface.co/EuroBERT/EuroBERT-210m)
- **Fine-tuning Approach:**
- Used **Hugging Face Trainer** for training.
- Applied **5-fold cross-validation**.
- **Decision threshold tuning** on aggregated predictions.
### Hyperparameters
| Parameter | Value |
| --------------------- | ------- |
| Learning Rate | `3e-05` |
| Weight Decay | `0.0` |
| Gradient Accumulation | `2` |
| Warmup Ratio | `0.1` |
| Epochs | `2` |
| Batch Size | `16` |
| Max Length | `256` |
- **Mixed Precision (fp16):** Used for efficiency on GPU.
## Evaluation
For transparency, we compare this model with its larger variant ([PopEuroBERT-610m](https://huggingface.co/przvl/PopEuroBERT-binary-610m)), both trained and evaluated on the same dataset and splits.
### Test Set Performance (Threshold = 0.5)
| Model | Accuracy | Precision | Recall | F1 Score | Loss |
|--------------------|----------|-----------|--------|----------|--------|
| **210M (this)** | 75.99% | 73.78% | 80.66% | 77.07% | 0.4959 |
| **610M** | 80.26% | 78.42% | 83.50% | 80.89% | 0.4631 |
### Test Set Performance (Optimized Threshold)
| Model | Threshold | Accuracy | Precision | Recall | F1 Score |
|--------------------|-----------|----------|-----------|--------|----------|
| **210M (this)** | 0.56 | 76.00% | 76.00% | 76.00% | 76.00% |
| **610M** | 0.43 | 79.81% | 76.63% | 85.78% | 80.94% |
While PopEuroBERT-210m performs well on the populism classification task, its larger variant shows stronger overall performance, especially in F1 score and recall.
## Limitations
- **Domain Specificity:**
This model was trained on Bundestag speeches and may not generalize to all political discourse.
- **Threshold Sensitivity:**
The decision threshold (`0.56`) was optimized for this dataset but may need adjustment for other corpora.
- **Potential Bias:**
Political speech contains biases inherent in dataset labeling.
## Ethical Considerations
- **Not suitable for high-stakes decision-making.**
This model is meant for **research purposes** in political discourse analysis.
- **Bias & Context Dependence:**
Populism is a complex concept. Automated detection should **not replace** human interpretation.
- **Transparent Use:**
Users should document and validate model outputs in their research.
## License
Released under the Apache **2.0 License**.
## Citation
If you use this model or its methodology, please cite:
- **The original EuroBERT paper:**
```bibtex
@misc{boizard2025eurobertscalingmultilingualencoders,
title={EuroBERT: Scaling Multilingual Encoders for European Languages},
author={Nicolas Boizard and Hippolyte Gisserot-Boukhlef and Duarte M. Alves and André Martins and Ayoub Hammal and Caio Corro and Céline Hudelot and Emmanuel Malherbe and Etienne Malaboeuf and Fanny Jourdan and Gabriel Hautreux and João Alves and Kevin El-Haddad and Manuel Faysse and Maxime Peyrard and Nuno M. Guerreiro and Patrick Fernandes and Ricardo Rei and Pierre Colombo},
year={2025},
eprint={2503.05500},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.05500}
}
```
- **The PopBERT dataset source:**
```bibtex
@article{Erhard_Hanke_Remer_Falenska_Heiberger_2025,
title={PopBERT. Detecting Populism and Its Host Ideologies in the German Bundestag},
volume={33},
DOI={10.1017/pan.2024.12},
number={1},
journal={Political Analysis},
author={Erhard, Lukas and Hanke, Sara and Remer, Uwe and Falenska, Agnieszka and Heiberger, Raphael Heiko},
year={2025},
pages={1–17}
}
``` |